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ABSTRACT 



The invention provides an encoding apparatus capable of 
encoding an image signal in a scalable fashion and also 
provides a decoding apparatus corresponding to the encod- 
ing apparatus. In particular, the invention provides an image 
signal encoding apparatus for encoding a plurality of image 
signals, wherein at least one of the plurality of image signals 
is an image signal representing a moving image object, and 
the plurality of image signals are encoded together with a 
signal used to combine the image signal representing the 
moving image object with other image signals. The encoded 
signal is decoded by the decoding apparatus according to the 
invention. The invention is characterized in that the appa- 
ratus includes: an image supplier for supplying a base layer 
image signal and an enhancement layer image signal scal- 
ably representing the image signal of the moving image 
object; an enhancement layer encoder for encoding the 
enhancement layer image signal thereby generating an 
encoded enhancement layer signal; and a base layer encoder 
for encoding the base layer image signal thereby generating 
an encoded base layer signal. In the above encoding process, 
a reference image signal used to calculate a motion vector of 
the enhancement layer image signal to be encoded is gen- 
erated by replacing the values of pixels outside the image 
object of the enhancement layer image signal with the values 
of predetermined pixels of the base layer image signal. 

13 Claims, 54 Drawing Sheets 
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FIG. 2 
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FIG. 5 
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FIG. 22 



Syntax 


No. of bits Mnemonic 


Video Session(){ 

video session start code 
do*{ 

Video ObjectO 
) while(nextbits_bytealigned()== 
video_object start code) 
next,start_code() 
video session end code 

} 


sc+8=32 
sc+8=32 


^concurrent loop solution to be provided by MSDL 




FIG. 23 




Syntax 


No. of bits Mnemonic 


Video Object(){ 

video_obiect_start„code 

video object id 

do{ 

Video Object Layer() 

while(nextbit$_bytealigned()==: 

video_objecf_layer start code) 
next_start_code() 


sc+3=27 
5 




} 
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FIG. 24 



Syntax No.of bits Mnemonic 



Video Object Layer{){ 



video_objectjayer_start_code sc+4=28 

video,objectJayer_id 4 

video_objectJayer_shape 2 
if(video_object_layer_shape= '00' )( 
video_object_layer_width ' 10 

video_object„layer_height 10 

video_object_layer_quantJype 1 
if{video_objectjayer_quantjype)( 

loadJntra_quant_mat 1 
if(load_intra_quant_mat) 

intra_quant_mat[64] 8*54 

load_nonintra_quant-mat 1 
if(load_nonintra_quant_mat) 

nonjntra_quant_mat[64] 8*64 

error_resilient_disable 1 

intra_acdc_pred_disable 1 

deblocking_filter,disable 1 

video_objectjayerjcode_forward 2 

video_objectJayerJcode_backward 2 

separate_motion_shape_texture 1 

scalability 1 
if($calability){ 

refjayer.id 4 

ref_layer_sampling_direc 1 

hor_sanriplingJactor_n 5 

hor_sampling_factor_m 5 

vert_sampling_factor_n 5 

verl_sampling_factor-m 5 

filLmode 1 

do{ 

Video Object PlaneQ 
}while(nextbits_bytealigned{)== 

video_object_plane_start_code) 
next_start_code() 
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FIG. 25 



Syntax 



No.of bits Mnemonic 



sc+8=32 



Video Object Plane()( 
VOP.starLcode 
do{ 

modulojime.base 1 
}while(modulo_time„ba$e_!= "0" 

VOPjime.increment 10 
VOP_predictionjype 2 
if(video.object_layer_shape!= "0" )( 

VOP_wldth 10 
VOP.hei^ht 10 
VOP_horizontal_mc_spatiaLred 1 0 

marker.bit 1 
VOP_verticaLmc,spatiaLref 1 0 

if(scalability && enhancementjype) 

background_composition 1 

if(VOP_prediction_type== '10' ) 

VOP^dbquant 2 

else 

VOP^quant 5 
if(lscalability)( 

if(lseparate_motion_shapeJexture) 
if(error_resilience_disable) 

combined_motion_shapejexture_coding() 
else( 
do( 
do{ 

combined_motion_shape_texture_coding() 
} while (nextbits_bytealigned() != 0000 
0000 0000 0000) 

if {nextbits_bytealigned() != 000 0000 
0000 0000 0000 0000) { 

next_resync_markerO 

resync_marker 17 
macroblock_number 1-12 
quant_scale 5 

} 

)while(nextbits_bytealigned() != 000 0000 
0000 0000 0000 0000) 
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FIG. 26 

Syntax No. of bits Mnemonic 



else { 

if(video_object_layeLshape != "00" ){ 
do { 

first_MMR_code 1-2 
} while (count of macroblocl(s != 
total number of macroblocks) 

if (error_resilience_disable) 
motion_coding() 

if (video_object_layer_shape != "00" ) 
shape_coding{) 
texture_coding() 

else 
do{ 

motion_coding() 

if (video_objectJayeLshape 1= "00" ) 
shape_coding() 
texture_coding() 
if (nextbits.bytealignedf) != 000 0000 
0000 0000 0000 0000) { 

next_resync_marker() 

resync_marker 17 
macroblock_number 1-12 
quant_scale 5 

} while (nextbits_bytealigned() != 000 0000 
0000 0000^ 0000 0000) 

else { 

if (background_composition){ 

!oad_backward_shape){ i 
if (Ioad_backward_shape){ 
backward_shape_coaing() 
load_forward_shape) l 
if (load_forward_shape) 
forward_$hape_coding() 

} ^ 

ref_select code o 
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FIG. 27 



Syntax No.of bits Mnemonic 



if (VOP_prediction type== "01" II VOP prediction type= 
= "10" ) { -K _yK 

forward_temporal_ref 10 
if (VOP_predlction_type== "10" ){ 
marker_bit 1 
backwardJemporaLref 10 

} ^ 

^ combined_motlon_shape_texiure_coding() 
next_start_code() 

) 
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FIG. 36 
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FIG. 43 



Syntax No.of bits Mnemonic 



Video Object Layer(){ 

vicleo_objecLlayer_start_code sc+4=28 

video_object_layer_id 4 

video_objecLlayer_shape 2 
if(video_object_layer_shape= '00' )( 

video_ob]ect„layer_width 1 0 

video_object_layer_height 1 0 

video_object_layer„quantJype 1 
if(video_object_layer_quanLtype)( 

load_intra_quant_mat 1 
if(loadJntra_quant_mat) 

intra_quant_mat[64] 8*64 
load_nonintra_quant-n)at l 
if(load_nonintra_quant_mat) 

nonintra_quant_mat[64] 8*64 

erfor_resilient_disable 1 
intra_acdc_pred_disable 1 
deblocl^ingjilter^disable 1 
video_objectjayer_fcodejorward 2 
video_object_layerJcode_backward 2 
separate_motion_shapejexture 1 
scalability 1 
if(scalability){ 

reLlayer_id 4 
reLlayer_sampling_direc 1 
hor_sampling_factor_n 5 
hor_samplingJactor_m 5 
vert_samplingJactor_n 5 
vert_samplingJactor-m 5 
enhancementjype 1 
filLmode 1 

} 

do{ 

Video Object Plane() 
}while(nextbits_bytealigned()== 

video_object_plane_start_code) 
next_start_code() 

} 
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METHOD AND APPARATUS FOR 
ENCODING ENHANCEMENT AND BASE 
LAYER IMAGE SIGNALS USING A 
PMDICTED IMAGE SIGNAL 

5 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to an image signal encoding 
method and an image signal encoding apparatus, an image iq 
signal decoding method and an image signal decoding 
apparatus, an image signal transmission method, and a 
recording medium which are suitable for use in systems for 
recording a moving image signal on a recording medium 
such as an magneto-optical disk or a magnetic tape and ^5 
reproducing the moving image signal from the recording 
medium thereby displaying the reproduced image on a 
display device, or systems, such as a video conference 
system, a video telephone system, broadcasting equipment, 
a multimedia database retrieving system, for transmitting a 20 
moving image signal via a transmission line from a trans- 
mitting end to a receiving end so that the transmitted moving 
image is displayed on a displaying device at the receiving 
end, and also systems for editing and recording a moving 
image signal, 25 

2. Description of the Related Art 

In the art of moving- image transmission systems such as 
video conference systems or video telephone systems, it is 
known to convert an image signal into a compressed code on 
the basis of line-to-line and/or frame-to-frame correlation of 
the image signal so as to use a transmission line in a highly 
efficient fashion. 

The encoding technique according to the MPEG (Moving 
Picture Experts Group) standard can provide a high com- 
pression efficiency and is widely used. The MPEG technique '^^ 
is a hybrid technique of motion prediction encoding and 
DOT (discrete cosine transform) encoding techniques. 

In the MPEG standard, several profiles (functions) at 
various levels (associated with the image size or the like) are 
defined so that the standard can be applied to a wide variety 
of applications. Of these, the most basic one is the main 
profile at main level (MP@ML). 

Referring to FIG. 44, an example of an encoder (image 
signal encoder) according to the MP@ML of the MPEG 45 
standard will be described below. An input image signal is 
supplied to a set of frame memories 1, and stored therein in 
the predetermined order. The image data to be encoded is 
applied, in units of macroblocks, to a motion vector extrac- 
tion circuit (ME) 2, The motion vector extraction circuit 2 50 
processes the image data for each frame as an 1-picture, a 
P-picmre, or a B-picture according to a predetermined 
procedure. In the above procedure, the processing mode is 
predefined for each frame of the image of the sequence, and 
each frame is processed as an I-picture, a P-picture, or a 55 
B-picture corresponding to the predefined processing mode 
(for example frames are processes in the order of I, B, P, B, 
P, . . . , B» P)- Basically, I-picturcs arc subjected to intraframe 
encoding, and P-pictures and B-pictures arc subjected to 
intcrframc prediction encoding, although the encoding mode go 
for P-pictures and B-pictures is varied adaptively macrob- 
lock by macroblock in accordance with the prediction mode 
as will be described later. 

The motion vector extraction circuit 2 extracts a motion 
vector with reference to a predetermined reference frame so 65 
as to perform motion compensation (interframe prediction). 
The motion compensation (interframe prediction) is per- 
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formed in one of three modes: forward, backward, and 
forward-and-backward prediction modes. The prediction for 
a P-picture is performed only in the forward prediction 
mode, while the prediction for a B-picture is performed in 
one of the above-described three modes. The motion vector 
extraction circuit 2 selects a prediction mode which can lead 
to a minimum prediction error, and generates a predicted 
vector in the selected prediction mode. 

The prediction error is compared for example with the 
dispersion of the given macroblock to be encoded. If the 
dispersion of the macroblock is smaller than the prediction 
error, prediction compensation encoding is not performed on 
that macroblock but, instead, intraframe encoding is per- 
formed. In this case, the prediction mode is referred to as an 
intraframe encoding mode. The motion vector extracted by 
the motion vector extraction circuit 2 and the information 
indicating the prediction mode employed are supplied to a 
variable-length encoder 6 and a motion compensation circuit 
(MC) 12. 

The motion compensation circuit 12 generates a predicted 
image on the basis of the motion vector. The result is applied 
to arithmetic operation circuits 3 and 10. The arithmetic 
operation circuit 3 calculates the difference between the 
value of the given macroblock to be encoded and the value 
of the predicted image. The result is supplied as a difference 
image signal to a DCT circuit 4. In the case of an 
intramacroblock, the arithmetic operation circuit 3 directly 
transfers the value of the given macroblock to be encoded to 
the DCT circuit 4 without performing any operation. 

The DCT circuit 4 performs a DCT (discrete cosine 
transform) operation on the input signal thereby generating 
DCT coefficients. The resultant DCT coefficients are sup- 
pUed to a quantization circuit (Q) 5. The quantization circuit 
5 quantizes the DCT coefficients in accordance with a 
quantization step depending on the amount of data stored in 
a transmission buffer 7. The quantized data is then supplied 
to the variable-length encoder 6. 

The variable-length encoder 6 converts the quantized data 
supplied from the quantization circuit 5 into a variable - 
length code using for example the Hufl&nan encoding 
technique, in accordance with the quantization step (scale) 
supplied from the quantization circuit 5. The obtained 
variable-length code is supplied to a transmission buffer 7. 

The variable-length encoder 6 also receives the quantiza- 
tion step (scale) from the quantization circuit 5 and the 
motion vector as well as the information indicating the 
employed prediction mode (the intraframe prediction mode, 
the forward prediction mode, the backward prediction mode, 
or forward-and-backward prediction mode in which the 
prediction has been performed) from the motion vector 
extraction circuit 2, and converts these received data into 
variable-length codes. 

The transmission buffer 7 stores the received encoded 
image data temporarily. A quantization control signal cor- 
responding to the amount of data stored in the transmission 
buffer 7 is fed back to the quantization circuit 5 from the 
transmission buffer 7. 

If the amount of residual data stored in the transmission 
buffer 7 reaches an upper allowable limit, the transmission 
buffer 7 generates a quantization control signal to the 
quantization circuit 5 so that the following quantization 
operation is performed using an increased quantization scale 
thereby decreasing the amount of quantized data. 
Conversely, if the amount of residual data decreases to a 
lower allowable limit, the transmission buffer 7 generates a 
quantization control signal to the quantization circuit 5 so 
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that the foUowiag quantization operation is performed using signal supplied from the motion conapensation circuit 27 to 

a decreased quantization scale thereby increasing the the image signal received from the IDCT circuit 24 thereby 

amount of quantized data. In this way, an overflow or creating an output image signal. In the case where the given 

underflow in the transmission buffer 7 is prevented. image signal is a P-picture, the output signal of the arith- 

The data stored in the transmission buffer 7 is read out at 5 metic operation circuit 25 is stored in the set of frame 

a specified time and output over a transmission line or memories 26 so that it can be used as a reference image 

recorded on a recording medium. signal processing a subsequent image signal to be 

-nie quantized data output by the quantization circuit 5 is ^.^^f^'^* ^ l"^^ inlramacroblock, the signal is 

also suppUed to an inverse quantization circuit 8. The smnply output without bemg subjected to any process via the 

inverse quantization circuit 8 performs inverse quantization anthmetic operation circuit 25. 

on the received data in accordance with the quantization step 1° the MPEG standard, various profiles at various levels 

given by the quantization circuit 5. The data (DCT coeffi- ^""^ ^^^0 defined, and various tools are available, 

cients generated by means of the inverse quantization) For example, scalability is available as one of these tools, 

output by the inverse quantization circuit 8 are supplied to The scalability of the MPEG encoding technique makes it 

an IDCT (inverse DCT) circuit 9 which in turn performs an possible to encode various image signals having different 

inverse DCT operation on the received data. The arithmetic image sizes at various frame rates. For example, in the case 

operation circuit 10 adds the predicted image signal to the of the spatial scalability, when only a base layer bit stream 

signal output from the IDCT circuit 9 for each macroblock, is decoded, an image signal having a small image size may 

and stores the resultant signal into a set of frame memories be decoded, while an image signal having a large image size 
(FM) 11 so that the stored image signal will be used as the ^ may be decoded if both base layer and enhancement layer bit 

predicted image signal. In the case of an intramacroblock, streams are decoded. 

the arithmetic operation circuit 10 direcdy transfers the With reference to FIG. 46, an example of an encoder 

macroblock output by the IDCT circuit 9 to the set of frame having the spatial scalability will be described below. In the 

memories (FM) 11 without performing any operation. spatial scaling, an image signal having a small image size is 

With reference to FIG. 45, an example of a decoder given as a base layer signal, while an image signal having a 

(image signal decoder) for performing a decoding operation large image size is given as an enhancement layer signal, 

according to the MP@ML standard of the MPEG wiU be The image signal in the base layer is first stored in a set 

described below. Coded image data transmitted via the of frame memories 1, and then is encoded in a manner 

transmission line is received by a receiving circuit (not similar to the MP@ML signal described above except that 

shown) or is reproduced by a reproducing apparatus. Such the output signal of an arithmetic operation circuit 10 is 

the coded image data is stored in a receiving buffer 21 supplied not only to a set of frame memories 11 so that it is 

temporarily and then supplied to a variable-length decoder used as a prediction reference image signal in the base layer, 

(IVLC) 22. The variable- length decoder 22 performs an but also to an up sampling circuit 31. The up sampling 

inverse variable-length encoding operation on the data sup- circuit 31 expands the received image signal supplied from 

plied from the receiving buffer 21. The variable-length the arithmetic operation circuit 10 up to an image size equal 

decoder 22 outputs a motion vector and information indi- to the image size in the enhancement layer so that it is used 

eating the associated prediction mode to a motion compen- as a prediction reference image signal in the enhancement 

sation circuit 27. The variable -length decoder 22 also sup- layer 

phes a quantization step -to the inverse quantization circuit On the other hand, the image signal in the enhancement 

23. Furthermore, the variable-length decoded data is sup- layer is first stored in a set of frame memories 51. A motion 

plied from the variable-length decoder 22 to the inverse vector extraction circuit 52 extracts a motion vector and 

quantization circuit 23. determines a prediction mode, in a manner similar to the 

The inverse quantization circuit 23 performs an inverse operation according to the MP@ML, 
quantization operation on the quantized data supplied firom 45 A motion compensation circuit 62 generates a predicted 

the variable-length decoder 22 using the quantization step image signal using the motion vector in the prediction mode 

supplied from the variable-length decoder 22, and supplies determined by the motion vector extraction circuit 52. The 

the resultant signal to an IDCT circuit 24. The IDCT circuit resultant signal is supplied to a weighting circuit (W) 34. 

24 performs an inverse DCT process on the data (DCT The weighting circuit 34 multiplies the predicted image 
coefficients) output by the inverse quantization circuit 23, 53 signal by a weighting factor W, and outputs the resultant 
and supplies the resultant data to an arithmetic operation signal to an arithmetic operation circuit 33, 

circuit 25. j^q signal output from the arithmetic operation circuit 10, 

When the image signal output by the IDCT circuit 24 is as described above, has been supplied to the up sampling 

an 1 -picture data, the data is stored via the arithmetic circuit 31. The up sampling circuit 31 expands the image 
operation circuit 25 in a set of frame memories 26 so that 55 signal generated by the arithmetic operation circuit 10 up to 

predicted image data can be produced later for use in a size equal to that of the image in the enhancement layer, 

processing an image signal input to the arithmetic operation The expanded image signal is supplied to a weighting circuit 

circuit 25. The data output by the arithmetic operation circuit (1-W) 32. The weighting circuit 32 multiplies the image 

25 is also output as a reproduced image signal to the outside. signal output from the up sampling circuit 31 by a weighting 
In the case where the input bit stream is a P- or B-picture 60 factor 1-W, and supplies the resultant signal to the arithmetic 

signal, the motion compensation circuit 27 generates a operation circuit 33. 

predicted image from the image signal stored in the set of The arithmetic operation circuit 33 generates a predicted 

frame memories 26 in accordance with the motion vector image signal by adding together the image signals output by 

and the associated prediction mode supplied from the the weighting circuits 32 and 34, and outputs the resultant 
variable-length decoder 22, and outputs the resultant pre- 65 signal to an arithmetic operation circuit 53. The image signal 

dieted image to the arithmetic operation circuit 25. The output by the arithmetic operation circuit 33 is also input to 

arithmetic operation circuit 25 adds the predicted image an arithmetic operation circuit 60. The arithmetic operation 
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circuit 60 adds together the image signal output by the 
arithmetic operation circuit 33 and an image signal output by 
an inverse DCT circuit 59. The resultant signal is stored in 
a set of frame memories 61 so that it is used as a predicted 
reference frame for the subsequent image signal to be 
encoded. 

Hie arithmetic operation circuit 53 calculates the differ- 
ence between the image signal to be encoded and the image 
signal output from the arithmetic operation circuit 33, and 
outputs the result as a difference image signal. However, in 
the case where the macroblock is to be processed in the 
intraframe encoding mode, the arithmetic operation circuit 
53 directly supplies the image signal to be encoded to a DCT 
circuit 54 without performing any operation. 

The DCT circuit 54 performs a DCT (discrete cosine 
transform) operation on the image signal output by the 
arithmetic operation circuit 53 thereby generating DCT 
coefiBcients. The generated DCT coefEcicnts are supplied to 
a quantization circuit 55. The quantization circuit 55 quan- 
tizes the DCT coefficients, as in the operation for the 
MP@ML data, using a quantization scale determined in 
accordance with the amount of data stored in a transmission 
buffer 57. The resuhant quantized data is supplied to a 
variable-length encoder 56. The variable-length encoder 56 
performs a variable-length encoding operation on the quan- 
tized data (quantized DCT coefiBcients), and outputs the 
resultant data as an enhancement layer bit stream via the 
transmission buffer 57. 

The quantized data from the quantization circuit 55 is also 
supplied to an inverse quantization circuit 58. The inverse 
quantization circuit 58 performs an inverse quantization 
operation on the received data using the same quantization 
scale as that employed by the quantization circuit 55. The 
resultant data is supplied to an inverse DCT circuit 59 and 
is subjected to an inverse DCT process. The result is 
supplied to the arithmetic operation circuit 60. The arith- 
metic operation circuit 60 adds together the image signal 
output from the arithmetic operation circuit 33 and the 
image signal output from the inverse DCT circuit 59, and 
stores the resultant signal in the set of frame memories 61. 

The variable-length encoder 56 also receives the enhance- 
ment layer motion vector extracted by the motion vector 
extraction circuit 52 and the information indicating the 
associated prediction mode, the quantization scale employed 
by the quantization circuit 55, and the weighting factor W 
used by the weighting circuits 32 and 34. These data are 
encoded by the variable-length encoder 56, and resultant 
data is output. Then, an enhancement layer bit stream and a 
base layer bit stream are multiplexed by a multiplexer (not 
shown) and output via a transmission line or recorded on a 
recording mediimi. 

Now referring to FIG. 47, an example of a decoder having 
the capability of spatial scaling will be described below. The 
base layer bit stream input to a reception buffer 21 is 
decoded in a similar manner to the MP@ML signal 
described above except that the output image signal of an 
arithmetic operation circuit 25 is not only supplied as a base 
layer image signal to the outside but also stored in the set of 
frame memories 26 so that it can be used as a prediction 
reference image signal in processing a subsequent image 
signal to be decoded. Furthermore, the output image signal 
of the arithmetic operation circuit 25 is also supplied to an 
up sampUng circuit 81 so as to expand the image signal to 
an image size equal to the image size in the enhancement 
layer so that it is used as a prediction reference image signal 
in the enhancement layer. 
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On the other hand, the enhancement layer bit stream is 
stored in a reception buffer 71, and then supplied to a 
variable-length decoder 72, The variable-length decoder 72 
performs a variable-length decoding operation on the 

5 received data thereby generating quantized DCT 
coefficients, a quantization scale, an enhancement layer 
motion vector, prediction mode data, and a weighting factor 
W. The variable-length decoded data output from the 
variable-length decoder 72 are supphed to an inverse quan- 

10 tization circuit 73. The inverse quantization circuit 73 per- 
forms an inverse quantization operation on the received data 
using the quantization scale. The resultant data is supplied to 
an inverse DCT circuit 74, and is subjected to an inverse 
DCT process. The resultant image signal is supplied to an 

15 arithmetic apparition circuit 75. 

The motion compensation circuit 77 generates a predicted 
image signal according to the decoded motion vector and 
prediction mode, and supplies the resultant signal to a 
weighting circuit 84. The weighting circuit 84 multiplies the 

20 output signal of the motion compensation circuit 77 by the 
weighting factor W decoded, and supphes the result to an 
arithmetic operation circuit 83. 

The output image signal of the arithmetic operation circuit 
25 is output as a reproduced base layer image signal, and 
also supplied to the set of frame memories 26, Furthermore, 
the image signal output from the arithmetic operation circuit 
25 is also supplied to the up sampling circuit 81 so as to 
expand it to an image size equal to the image size in the 
enhancement layer. The expanded image signal is then 
supplied to a weighting circuit 82. The weighting circuit 82 
multiplies the image signal output from the up sampling 
circuit 81 by a weighting factor (zl-W) decoded, and 
supplies the resultant signal to the arithmetic operation 

circuit 83. 
35 . . 

Arithmetic operation circuit 83 adds together the output 

image signals of the weighting circuits 82 and 84, and 

supplies the result to the arithmetic operation circuit 75. The 

arithmetic operation circuit 75 adds the image signal output 

from the inverse DCT circuit 74 and the image signal output 

from the arithmetic operation circuit 83, thereby generating 

a reproduced enhancement layer image, which is supplied 

not only to the outside but also to a set of frame memories 

76. The signal stored in the set of frame memories 76 is used 

as a prediction reference image signal in a later process to 

decode a subsequent image signal. 

Although the above description deals with the operation 

of processing a luminance signal, the operation associated 

with a color difference signal is also performed in a similar 

manner except that the motion vector used for the luminance 

signal is reduced to half in both vertical and horizontal 

directions. 

In addition to the MPEG standard, there are various 
standards for converting a moving image signal into a 

55 compressed code in a highly efiScient manner. For example, 
the H.261 and H,263 standards established by the ITU-T are 
employed in encoding process especially for communica- 
tion. Although there are some differences in the details 
associated with for example header information, the H.261 

60 and H.263 standards are also based on the combination of 
motion compensation prediction encoding and DCT 
encoding, and thus an encoder and a decoder can be imple- 
mented in a similar manner to those described above. 
It is also known in the art to compose an image by 

65 combining a plurality of images using a chromakey. In this 
technique, an image of an object is taken in front of a 
background having a particular uniform color such as blue. 
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Areas having colors other than blue are extracted from the In the case of an image signal generated using a chro- 

image, and the extracted image is combined with another makey such as that shovsm in FIG. 49, the image signals 

image. In the above process, the signal representing the VO-0 to VO-n and associated key signals output from the 

extracted areas is referred to as a key signal. yO generator 101 are directly used as image signals of the 

FIG. 48 illustrates the method of encoding a composite 5 respective VOs and associated key signals. When an image 

image signal. In FIG. 48, a background image Fl and a has no key signal or the key signal of the image is lost, a key 

foreground image F2 are combined into a single image. The signal is generated by extracting predetermined areas by 

foreground image F2 is obtained by taking a picture of an ^eans of image area division technique thereby generating 

object in front of a background having a particular color, and ^ yq 

then extracting the areas having colors different from the „ V ^j^t^ -^^^ » 

background color. The extracted areas are represented by a '° ^ach VOP generator 102-0 to 102-n extracts a mimmum 
key signal Kl. A composite image F3 is obtained by octangular containmg an object in the image from each 
combining the foreground image F2 and the background ™^g® ^^^^ wherem the size of the rectangular is selected 
image Fl using the key signal Kl. Then the composite image ^^^^ ^^^^ number of pixels in the vertical direction and 
F3 is encoded according to an appropriate encoding tech- ^^^^ i° horizontal direction are integral multiples of 16. 
nique such as the MPEG encoding technique. When the The respective VOP generators 102-0 to 102-n then extract 
composite image is encoded, the information of the key ^° image signal (luminance signal and color difference 
signal is lost. Therefore, when the decoded composite image ^^§^^1) ^ key signal included in the corresponding 
is edited or recomposed, it is difficult to change only the rectangles, and output the extracted signals. The VOP gen- 
background image Fl whQe maintaining the foreground erators also output a flag indicating the size of the VOPs and 
image F2 unchanged. position of the VOPs represented in absolute coordi- 

Instead, as shown in FIG. 49, the backgrouiid image Fl, nates, 

the foreground image F2, and the key signal Kl may first be The output signals of the respective VOP generators 

encoded separately, and then the respective encoded signals 102-0 to 102-n are input to corresponding VOP encoders 

may be multiplexed into a single bit stream of a composite - 103-0 to 103-n and encoded. The output signals of the VOP 

image F3. encoders 103-0 to 103-n are input to a multiplexer 104 and 

FIG. 50 illustrates the technique of decoding the bit combined into a single bit stream, 

stream produced in the manner shown in FIG. 49 into a When the bit stream containing multiplexed signals is 

composite image F3. The bit stream is subjected to a input to the decoder shown in FIG. 52, the input bit stream 

demultiplexing process and is decomposed into separate bh 33 is first demultiplexed by a demultiplexer 111 into separate 

streams of the image Fl, the image F2, and the key signal bit streams associated with the respective VOs. The respec- 

Kl, respectively. These bit streams are decoded separately live VO bit streams are input to corresponding VOP decod- 

so as to obtain a decoded image Fl', a decoded image F2*, ers 112-0 to 112-n and decoded. Thus, the image signals, key 

and a decoded key signal Kl'. If the decoded image Fl* is signal, the flags indicating the VOP sizes, and the flags 

combined with the decoded image F2' using the decoded key 35 indicating the positions of VOPs represented in absolute 

signal Kl', then it is possible to obtain a decoded composite coordinates of the respective VOPs are reproduced by the 

image F3'. In this technique, it is possible to easily carry out respective VOP decoders 112-0 to 112-n, The reproduced 

re-edit or recomposition. For example it is possible to signals are input to an image reconstruction circuit 113. The 

change only the background image Fl while maintaining the image reconstmction circuit 113 generates a reproduced 

foreground image F2. 4^ image using the image signals, key signals, size flags, 

In the following description, a sequence of images such as absolute coordinate position flags associated with the 

images Fl and F2 constituting a composite image are respective VOPs. 

referred to as a VO (video object). An image frame of a VO Referring to FIGS. 53 and 54, examples of the construc- 

at a certain time is referred to as a VOP (video object plane). tions of the VOP encoder 103-0 and the VOP decoder 112-0 

Each VOP consists of a luminance signal, a color difference 45 are described below. In FIG. 53, The image signal and the 

signal, and a key signal. key signal of each VOP are input to an image signal encoder 

An image frame refers to one image at a certain time. An 121 and a key signal encoder 122, respectively. The image 

image sequence is a set of image frames taken at various signal encoder 121 encodes the image signal according to 

times. That is, each VO is a set of VOPs at various times. foi" example the MPEG or H.263 standard. The key signal 

The size and position of each VO vary with time. That is, 50 encoder 1L22 encodes the received key signal by means of 

evenif VOPs are included in the same VO, they can be differ example DPCM. Alternatively, motion compensation 

in size and position from one another. associated with the key signal may be performed using the 

HGS. 51 and 52 illustrate an encoder and decoder, motion vector detected by the image signal encoder 121, and 

respectively, according to the present technique. An image obtained differential signal may be encoded. The amount 

signal is first input to a VO generator 101. The VO generator 55 generated in the key signal encoding is input to the 

101 decomposes the input signal into a background image ^3.g& signal encoder 121 and is controlled so that the bit 

signal, an image signal of each object, and an associated key ^^^^ maintained at a predetermined value, 

signal. Each VO consists of an image signal and a key The bit stream of the encoded image signal (motion vector 

signal. The respective VOs of image signals output from the and texture information) and the bit stream of the encoded 

VO generator 101 are input to corresponding VOP genera- 60 key signal are input to a multiplexer 123 and combined into 

tors 102-0 to 102-n. For example, the image signal and the a single bit stream. The resultant bit stream is output via a 

key signal of Vo-0 are input to the VOP generator 102-0, and transmission buffer 124. 

the image signal and the key signal of Vo-1 are input to the When the bit stream is input to the VOP decoder shown 

VOP generator 102-1. Similarly, the image signal and the in FIG. 54, the bit stream is first applied to a demultiplexer 

key signal of Vo-n are input to the VOP generator 102-n, 65 131. The Demultiplexer 131 demultiplexes the received bit 

When the image signal represents a backgroimd, there is no stream into the bit stream of the image signal (motion vector 

key signal. and texture information) and the bit stream of the key signal. 
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which are then decoded by an image signal decoder 132 and 
a key signal decoder 133, respectively. In the case where the 
key signal is encoded by means of motion compensation, the 
motion vector decoded by the image signal decoder 132 is 
input to the key signal decoder 133 so that the key signal 5 
decoder 133 can decode the key signal using the motion 
vector. 

The above-described method of decoding the image VOP 
by VOP has a problem associated with the motion compen- 
sation which occurs when the image is decoded VOP by lo 
VOP. The VOP varies in the size and position with time. 
That is, VOPs belonging to the same VO are differ in size 
and position from one another. Therefore, when a VOP 
which is different in time is referred to for example in the 
motion compensation process, it is required to encode the 15 
flag indicating the position and size of the VOP and transmit 
the encoded flag signal, as will be described in detail below 
with reference to FIG. 55. 

In FIG. 55, an image Fll corresponds to a VOP at a time 
t of a certain video object VOO, and an image F12 corre- '^^ 
spends to a VOP at the same time t of a video object VOL 
The images Fll and F12 are different in size from each 
other. The positions of the images Fll and F12 are repre- 
sented by absolute coordinates OSTO and OSTl, respec- 
tively. 25 

If a VOP to be encoded and a VOP to be referred to are 
placed in an absolute coordinate system, and a reference 
position in absolute coordinates is transmitted as a motion 
vector, it becomes possible to realize motion compensation. 

In this case, the motion compensation is performed as 
follows. In the following description, it is assumed that the 
image has an arbitrary shape. In the case where the VOP has 
a rectangular shape, the motion compensation can be per- 
formed according to the known method such as that defined 
in the H.263 standard. 

FIG. 56 illustrates a current VOP to be encoded. The VOP 
has a rectangular shape containing an image object wherein 
the size of the rectangle is an integral multiple of 16 in both 
horizontal and vertical directions. The size of the rectangle 
of the VOP is selected such that the resultant rectangle is a 
minimum one which can contain the object. When the VOP 
is encoded, encoding and motion compensation are per- 
formed from one macroblock to another wherein each 
macroblock has a size of 16x16 pixels. The size of each 
macroblock may also be set to 8x8 pixels, and the motion 
compensation may be performed from one macroblock to 
another having the same size. 

FIG. 57 illustrates a VOP to be referred to. The VOP is 
stored at a predetermined location of a frame memory in 
accordance with the flag indicating the position of the VOP 
in the absolute coordinates and the flag indication the VOP 
size. In the case of a VOP having an arbitrary shape, when 
a motion vector is extracted, a problem occurs due to the fact 
that the VOP has an area containing an image and an area 55 
containing no image. 

First, the process performed on the reference VOP will be 
described below. In the case where the reference VOP has an 
arbitrary shape, the pixel values in the area containing no 
image are calculated from the pixel values in the area go 
containing an image as described below, 

1. First, the pixel values in the outside of the image object, 
in which there is no image, are set to 0. 

2. The VOP is then scanned in the horizontal direction. 
Each horizontal line of the VOP is divided into line segments 65 
in which all pixel values are 0 and Une segments in which 

aU pixels have values which are not equal to 0. Those line 
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segments in which all pixels have values not equal to 0 are 
not subjected to any process. The other line segments can be 
divided into Une segments whose both ends have non-zero 
pixel values and line segments whose one end is an end of 
the VOP and the other end is a non-zero pixel value. Those 
line segments whose both ends have non-zero pixel values 
are subjected to replacement such that all pixel values on the 
line segments are replaced with the average of -the pixel 
values at both ends. In the other case, the pixel values on the 
line segments are all replaced with the non-zero pixel value 
at one end. 

3. The process step 2 is also performed in the vertical 
direction. 

4. For those pixels which are changed in value in both 
process steps 2 and 3, the pixel values are replaced by means 
values. 

5. For those pixels which have a pixel value of 0 when the 
process 4 has been completed, the pixel values are replaced 
by the value of a non-zero pixel at the nearest location. If 
there are two nearest non-zero pixels, the mean value of 
these two pixel values is employed. 

When a motion vector is detected, the pixel values in 
non-image areas of a reference VOP are set to non-zero 
values according to the above -described method. A predic- 
tion error relative to the reference image is calculated for a 
macroblock to be encoded, and a vector which gives a 
minimum prediction error is employed as a motion vector. In 
this calculation process, the VOP to be encoded can be such 
a VOP having an arbitrary shape, or the macroblock to be 
encoded can include an area containing no image. When the 
macroblock includes an area containing no image, those 
pixels in the area containing no image are neglected in the 
calculation of the prediction error. That is, the prediction 
error is calculated using only those pixels corresponding to 
an image. 

Whether each pixel in the VOP corresponds to an image 
or not can be judged by referring to the corresponding key 
signal. If the corresponding key signal has a value of 0, the 
pixel is not in an image. In the other case, the pixel is in an 
image. 

When the motion vector is detected using the technique 
described above, it is required to perform a great amount of 
computations. Thus, there is a need for a method of per- 
forming computations in a more simple fashion. 

In view of the above, it is an object of the present 
invention to provide a technique of improving the encoding 
efiSciency thereby reducing the computation cost. 

SUMMARY OF THE INVENTION 

According to an aspect of the present invention, there is 
provided an image signal encoding apparatus for encoding a 
plurality of image signals, at least one of the pluraUty of 
image signals being an image signal representing a moving 
image object, at least one of the plurality of image signals 
including a signal used to combine it with other image 
signal(s) of the plurality of image signals, the apparatus 
comprising: 

an image supplier for supplying a base layer image signal 
and an enhancement layer image signal scalably represent- 
ing the image signal representing a moving image object; 
an enhancement layer encoder for encoding the enhance- 
ment layer image signal thereby generating an encoded 
enhancement layer signal; and 
a base layer encoder for encoding the base layer image 
signal thereby generating an encoded base layer signal; 
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wherein the enhancement layer encoder comprises: 

a generator for generating a reference image signal used 
to calculate a motion vector of the enhancement layer 
image signal to be encoded, the reference image signal 
being generated by replacing the values of pixels 5 
outside the image object of the enhancement layer 
image signal with the values of predetermined pixels of 
the base layer image signal; 

a detector for detecting the motion vector of the enhance- 
ment layer image signal to be encoded using the 
reference image signal; and 

an enhancement layer encoder for encoding the enhance- 
ment layer image signal to be encoded using a pre- 
dicted image signal of the enhancement layer image 
signal to be encoded, the predicted image signal being 
generated by performing motion compensation using 
the motion vector detected. 

According to another aspect of the invention, there is 
provided an image signal encoding method for encoding a 
plurality of image signals, at least one of the plurality of 
image signals being an image signal representing a moving 
image object, at least one of the plurality of image signals 
including a signal used to combine it with other image 
signal(s) of the plurality of image signals, the method 
comprising the steps of: 

supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 

encoding the enhancement layer image signal thereby 30 
generating an encoded enhancement layer signal; and 

encoding the base layer image signal thereby generating 
an encoded base layer signal; 

wherein the step of encoding the enhancement layer 
image signal comprises the steps of: 35 

generating a reference image signal used to calculate a 
motion vector of the enhancement layer image signal to 
be encoded, the reference image signal being generated 
by replacing the values of pixels outside the image 
object of the enhancement layer image signal with the 
values of predetermined pixels of the base Layer image 
signal; 

detecting the motion vector of the enhancement layer 
image signal to be encoded using the reference image 
signal; and 

encoding the enhancement layer image signal to be 
encoded using a predicted image signal of the enhance- 
ment layer image signal to be encoded, the predicted 
image signal being generated by performing motion 
compensation using the motion vector detected. 
According to still another aspect of the invention, there is 
provided an image signal transmission method for encoding 
a plurality of image signals and then transmitting the 
encoded signals, at least one of the plurality of image signals 
being an image signal representing a moving image object, 
at least one of the plurahty of image signals including a 
signal used to combine it with other image signal(s) of the 
plurahty of image signals, the method comprising the steps 
of: 

60 

supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 

encoding the enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 55 

encoding the base layer image signal thereby generating 
an encoded base layer signal; 
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the step of encoding the enhancement layer image signal 
comprising the steps of: 

generating a reference image signal used to calculate a 
motion vector of the enhancement layer image signal to 
be encoded, the reference image signal being generated 
by replacing the values of pixels outside the image 
object of the enhancement layer image signal with the 
values of predetermined pixels of the base -Layer 
image signal; 

detecting the motion vector of the enhancement layer 
image signal to be encoded using the reference image 
signal; 

encoding the enhancement layer image signal to be 
encoded using a predicted image signal of the enhance- 
ment layer image signal to be encoded, the predicted 
image signal being generated by performing motion 
compensation using the motion vector detected; and 
generating a flag indicating an image to be replaced; 
the method further comprising the step of transmitting the 
encoded enhancement layer image signal, the encoded 
base layer image signal, the motion vector, and the flag. 
According to a further aspect of the invention, there is 
provided an image signal decoding apparatus for receiving 
an encoded signal generated by encoding a plurality of 
image signals and then decoding the encoded signal, at least 
one of the plurality of image signals being an image signal 
representing a moving image object, at least one of the 
plurality of image signals including a signal used to combine 
it with other image signal(s) of the plurality of image 
signals, the encoded signal including an encoded enhance- 
ment layer signal, an encoded base layer signal, a motion 
vector, and a flag indicating an image to be replaced, the 
apparatus comprising: 
a separator for separating the encoded signal into the 
encoded enhancement layer signal, the encoded base 
layer signal, the motion vector, and the flag; 
a base layer decoder for decoding the encoded base layer 
signal thereby generating a decoded base layer image 
signal; and 

an enhancement layer decoder for decoding the encoded 
enhancement layer signal thereby generating a decoded 
enhancement layer image signal; 
wherein the enhancement layer decoder comprises: a 
replaced image generator for generating a replaced 
image signal by replacing the values of pixels outside 
an image object of the decoded enhancement layer 
image signal with the values of predetermined pixels of 
the base layer image signal in accordance with the flag; 
and a generator for generating the decoded enhance- 
ment layer image signal using a predicted image signal 
generated by performing motion compensation on the 
replaced image signal using the motion vector. 
According to another aspect of the invention, there is 
provided an image signal decoding method for receiving an 
encoded signal generated by encoding a plurality of image 
signals and then decoding the encoded signal, at least one of 
the plurality of image signals being an image signal repre- 
senting a moving image object, at least one of the plurality 
of image signals including a signal used to combine it with 
other image signal(s) of the plurality of image signals, the 
encoded signal including an encoded enhancement layer 
signal, an encoded base layer signal, a motion vector, and a 
flag indicating an image to be replaced, the method com- 
prising the steps of: 
separating the encoded signal into the encoded enhance- 
ment layer signal, the encoded base layer signal, the 
motion vector, and the flag; 
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decoding the encoded base layer signal thereby generating 

a decoded base layer image signal; and 
decoding the encoded enhancement layer signal thereby 
generating a decoded enhancement layer image signal; 
wherein the step of decoding the enhancement layer 5 

signal comprises the steps of: 
generating a replaced image signal by replacing the values 
of pixels outside an image object of the decoded 
enhancement layer image signal with the values of 
predetermined pixels of the base layer image signal in lo 
accordance with the flag; and 
generating the decoded enhancement layer image signal 
using a predicted image signal generated by performing 
motion compensation on the replaced image signal 
using the motion vector. 
According to still another aspect of the invention, there is 
provided an image signal recording medium capable of 
being decoded by a decoding apparatus, the recording 
medium including a recorded signal, the recorded signal 
including an encoded signal generated by encoding a plu- 
rality of image signals, at least one of the plurality of image 
signals being an image signal representing a moving image 
object, at least one of the plurality of image signals including 
a signal used to combine it with other image signal(s) of the 
plurality of image signals, the encoded signal including an 
encoded enhancement layer signal, an encoded base layer 
signal, a motion vector, and a Sag indicating an image to be 
replaced, the encoded signal being generated by the steps of: 
supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 
encoding the enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 
encoding the base layer image signal thereby generating 

an encoded base layer signal; 35 
wherein the step of encoding the enhancement layer 
image signal comprises the steps of: generating a 
reference image signal used to calculate a motion 
vector of the enhancement layer image signal to be 
encoded, the reference image signal being generated by 40 
replacing the values of pixels outside the image object 
of the enhancement layer image signal with the values 
of predetermined pixels of the base layer image signal; 
delecting the motion vector of the erihancement layer 
image signal to be encoded using the reference image 45 
signal; encoding the enhancement layer image signal to 
be encoded using a predicted image signal of the 
enhancement layer image signal to be encoded, the 
predicted image signal being generated by performing 
motion compensation using the motion vector detected; 50 
and generating a flag indicating an image to be 
replaced. 

According to still another aspect of the invention, there is 
provided an image signal encoding apparatus for encoding a 
plurahty of image signals, at least one of the plurality of 55 
image signals being an image signal representing a moving 
image object, at least one of the plurality of image signals 
including a signal used to combine it with other image 
signal(s) of the plurality of image signals, the apparatus 
comprising: 

an image supplier for supplying an enhancement layer 
image signal and a base layer image signal scalably 
representing the image signal representing a moving 
image object; 

an enhancement layer encoder for encoding the enhance- 65 
menl layer image signal thereby generating an encoded 
enhancement layer signal; and 



a base layer encoder for encoding the base layer image 
signal thereby generating an encoded base layer signal; 

wherein the base layer encoder comprises: 

a generator for generating a reference image signal used 
to calculate a motion vector of the base layer image 
signal to be encoded, the reference image signal being 
generated by replacing the values of pixels outside the 
image object of the base layer image signal with the 
pixel values obtained by extrapolating the pixel values 
inside the image object; 

a detector for detecting the motion vector of the base layer 
image signal to be encoded using the reference image 
signal; and 

an enhancement layer encoder for encoding the base layer 
image signal to be encoded using a predicted image 
signal of the base layer image signal to be encoded, the 
predicted image signal being generated by performing 
motion compensation using the motion vector detected. 
According to still another aspect of the invention, there is 
provided an image signal encoding method for encoding a 
plurality of image signals, at least one of the plurality of 
image signals being an image signal representing a moving 
image object, at least one of the plurality of image signals 
including a signal used to combine it with other image 
signal(s) of the plurality of image signals, the method 
comprising the steps of: 
supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 
encoding the enhancement layer,. image signal thereby 
generating an encoded enhancement layer signal; and 
encoding the base layer image signal thereby generating 

an encoded base layer signal; 
wherein the step of encoding the base layer image com- 
prising the steps of: 
generating a reference image signal used to calculate a 
motion vector of the base layer image signal to be 
encoded, the reference image signal being generated by 
replacing the values of pixels outside the image object 
of the base layer image signal with the pixel values 
obtained by extrapolating the pixel values inside the 
image object; 

detecting the motion vector of the base layer image signal 

to be encoded using the reference image signal; and 
encoding the base layer image signal to be encoded using 
a predicted image signal of the base layer image signal 
to be encoded, the predicted image signal being gen- 
erated by performing motion compensation using the 
motion vector detected. 
According to still another aspect of the invention, there is 
provided an image signal transmission method for encoding 
a plurality of image signals and then transmitting the 
encoded signals, at least one of the plurality of image signals 
being an image signal representing a moving image object, 
at least one of the plurality of image signals including a 
signal used to combine it with other image signal(s) of the 
plurality of image signals, the method comprising the steps 
of: 

supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 

encoding the enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 

encoding the base layer image signal thereby generating 
an encoded base layer signal; 
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wherein the step of encoding the base layer image com- 
prising the steps of: 

generating a reference image signal used to calculate a 
motion vector of the base layer image signal to be 
encoded, the reference image signal being generated by 5 
replacing the values of pixels outside the image object 
of the base layer image signal with the pixel values 
obtained by extrapolating the pixel values inside the 
image object; 

detecting the motion vector of the base layer image signal 

to be encoded using the reference image signal; 
encoding the base layer image signal to be encoded using 
a predicted image signal of the base layer image signal 
to be encoded, the predicted image signal being gen- 
erated by performing motion compensation using the 
motion vector detected; and 
transmitting the encoded enhancement layer signal and 

the encoded base layer signal. 
According to still another aspect of the invention, there is 20 
provided an image signal decoding apparatus for receiving 
an encoded signal generated by encoding a plurality of 
image signals and then decoding the encoded signal, at least 
one of the plurality of image signals being an image signal 
representing a moving image object, at least one of the 25 
plurality of image signals including a signal used to combine 
it with other image signal(s) of the plurality of image 
signals, the encoded signal including an encoded enhance- 
ment layer signal, an encoded base layer signal, a motion 
vector, and a flag indicating an image to be replaced, the 3Q 
apparatus comprising: 

a separator for separating the encoded signal into the 
encoded enhancement layer signal, the encoded base 
layer signal, the motion vector, and the flag; 
a base layer decoder for decoding the encoded base layer 
signal thereby generating a decoded base layer image 
signal; and 

an enhancement layer decoder for decoding the encoded 
enhancement layer signal thereby generating a decoded 
enhancement layer image signal; 

the base layer decoder comprises: 

a replaced image generator for generating a replaced 
image signal by replacing the values of pixels outside 
an image object of the decoded base layer image signal ^5 
with the pixel values obtained by extrapolating the 
pixel values inside the image object in accordance with 
the flag; 

a generator for generating the decoded base layer image 
signal using a predicted image signal generated by 50 
performing motion compensation on the replaced 
image signal using the motion vector. 
According to still another aspect of the invention, there is 
provided an image signal decoding apparatus for receiving 
an encoded signal generated by encoding a plurality of 55 
image signals and then decoding the encoded signal, at least 
one of the plurality of image signals being an image signal 
representing a moving image object, at least one of the 
plurality of image signals including a signal used to combine 
it with other image signal(s) of the plurality of image go 
signals, the encoded signal including an encoded enhance- 
ment layer signal, an encoded base layer signal, a motion 
vector, and a flag indicating an image to be replaced, the 
method comprising the steps of: 
separating the encoded signal into the encoded enhance- 65 
ment layer signal, the encoded base layer signal, the 
motion vector, and the flag; 
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decoding the encoded base layer signal thereby generating 

a decoded base layer image signal; and 
decoding the encoded enhancement layer signal thereby 
generating a decoded enhancement layer image signal; 
the step of decoding the enhancement layer signal com- 
prises the steps of: 
generating a replaced image signal by replacing the values 
of pixels outside an image object of the decoded base 
layer image signal with the values obtained by extrapo- 
lating pixel values inside the image object in accor- 
dance wit the flag; and 
generating the decoded base layer image signal using a 
predicted image signal generated by performing motion 
compensation on the replaced image signal using the 
motion vector. 
According to still another aspect of the invention, there is 
provided an image signal recording medium capable of 
being decoded by a decoding apparatus, the recording 
medium including a recorded signal, the recorded signal 
including an encoded signal generated by encoding a plu- 
rality of image signals, at least one of the plurality of image 
signals being an image signal representing a moving image 
object, at least one of the plurality of image signals including 
a signal used to combine it with other image signal(s) of the 
plurality of image signals, the encoded signal including an 
encoded enhancement layer signal, an encoded base layer 
signal, and a motion vector, the encoded signal being 
generated by the steps of: 
supplying a base layer image signal and an enhancement 
layer image signal scalably representing the image 
signal representing a moving image object; 
encoding the enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 
encoding the base layer image signal thereby generating 

an encoded base layer signal; 
wherein the step of encoding the base layer image com- 
prising the steps of; 

generating a reference image signal used to calculate a 
motion vector of the base layer image signal to be 
encoded, the reference image signal being generated by 
replacing the values of pixels outside the image object 
of the base layer image signal with -the pixel values 
obtained by extrapolating the pixel values inside the 
image object; 

detecting the motion vector of the base layer image signal 
to be encoded using the reference image signal; and 

encoding the base layer image signal to be encoded using 
a predicted image signal of the base layer image signal to be 
encoded, the predicted image signal being generated by 
performing motion compensation using the motion vector 
detected; 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram illustrating an example of the 
constmction of a VOP encoder employed in an image signal 
encoding apparatus according to the present invention; 

FIG. 2 is a schematic representation of a process per- 
formed on pictures in the enhancement layer and the base 
layer; 

FIG. 3 is a schematic representation of the relationship 
between images in the enhancement layer and the base layer; 

FIG. 4 is a schematic representation of the relationship 
between images in the enhancement layer and the base layer; 

FIG. 5 is a schematic representation of the relationship 
between images in the enhancement layer and the base layer; 
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FIG. 6 is a schematic representation of the relationship FIG. 37 is a schematic representation of the relationship 

between images in the enhancement layer and the base layer; among images in the base layer, the first enhancement layer, 

FIG, 7 is a schematic representation of the relationship ^^id the second enhancement layer; 

between images ia the enhancement layer and the base layer; FIG. 38 is a schematic representation of the relationship 

FIG. 8 is a block diagram illustrating an example of the ^ among images in the base layer, the first enhancement layer, 

construction of the base layer encoder 204 shown in FIG. 1; second enhancement layer; 

FIG. 9 is a flow chart illustrating the operation of the pixel I^IG. 39 is a block diagram illustrating another example of 

replacement circuit 221 shown in FIG. 8; construction of a VOP decoder used in the image signal 

FIG. 10 is a schematic representation of the process lo ^^^^^ apparatus according to the present invention; 

shown in the flow chart of FIG. 9; FI^- 40 is a block diagram illustrating an example of the 

HG. 11 is a block diagram iUustrating an example of the construction of the first enhancement layer decoder 253 

construction of the enhancement layer encoder 203 shown in shown m FIG. 39; 

FIG, 1; FIG. 41 is a block diagram illustrating an example of the 

HG. 12 is a schematic representation of a process per- 15 constmction of the VOP reconstruction circuit 259 shown in 

formed on pictures in the enhancement layer and the base ' 

layer; FIG. 42 is a block diagram illustrating another example of 

HG, 13 is a schematic representation of a pixel replace- °^ reconstruction circuit 259 

ment process; ^ shown m FIG. 39; 

FIG. 14 is a flow chart illustrating the process performed illustrates the syntax of a video object layer; 

by the pixel replacement circuit 231 shown in FIG. 11; FIG. 44 is a block diagram illustrating an example of the 

HG. 15 is a schematic representation of a pixel replace- construction of an image signal encoding apparatus; 

ment process; FIG. 45 is a block diagram illustrating an example of the 

HG. 16 is a schematic representation of a pixel replace- 25 constmction of an image signal decoding apparatus; 

ment process; FIG. 46 is a block diagram illustrating another example of 

FIG. 17 is a flow chart iUustrating the pixel replacement construction of an image signal encoding apparatus; 

process; FIG. 47 is a block diagram illusu^ting another example of 

HG. 18 is block diagram illustrating an example of the construction of an image signal decoding apparatus; 

construction of a VOP decoder used in an image signal FIG. 48 is a schematic representation of the process of 

decoding apparatus according to the present invention; combining a plurality of images into a single composite 

FIG. 19 is a block diagram illustrating an example of the image; 

construction of the enhancement layer decoder 254 shown in FIG. 49 is a schematic representation of the process of 

FIG. 18; combining a plurality of images into a single composite 

FIG. 20 is a block diagram illustrating an example of the image; 

construction of the base layer decoder 253 shown in FIG. 18; FIG. 50 is a schematic representation of the process of 

FIG. 21 is a schematic representation of the structure of combining a plurality of images into a single composite 

a bit stream; image; 

FIG. 22 iUustrates the syntax of a video session; 40 FIG. 51 is a block diagram illustrating still another 

FIG. 23 iUustrates the syntax of a video object; example of the construction of an image signal encoding 

FIG. 24 illustrates the syntax of a video object layer; apparatus; 

FIG. 25 iUusU-ates the syntax of a video object plane; FIG, 52 is a block diagram iUustrating stUl another 

FIG. 26 illustrates the syntax of a video object plane; ^^^"'P^^ construction of an image signal decoding 

45 ani^aratus' 

FIG. 27 iUustrates the syntax of a video object plane; !nT^ ^ • 

FIG. 53 IS a block diagram illustrating an example of the 

spatial scalabUity encoding; construction of the VOP encoder 103-0 shown in FIG. 51; 

FIG. 54 is a block diagram illustrating an example of the 

spatial scalability encoding; so *^ ^OP decoder 112-0 shown in HG. 52; 

FIG. 55 is a schematic representation of absolute coordi- 

spatial scalabUity encoding; 

FIG. 56 is a schematic representation of an image object; 
and 

55 FIG. 57 is a schematic representation of an image object, 

spatial scalability encoding; DESCRIPTION OF THE PREFERRED 



spatial scalability encoding; 



spatial scalability encoding; 
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tA • Liij- -11..- . t ^ encoding apparatus according to the present 

nG.34isablock diagram Illustrating^ 60 invention, scalable encoding is performed VO by VO by 

a VOP encoder used m the image signal encodmg apparatus , ^^.^le encoding technique according to for 

accordmg to the present invention; ^^^^^^^ ^^^^ ^^^^^^^^ ^^^^^^^ 

HG. 35 IS a block diagram Ulustrating an example of the perfonned VO by VO, an image in the enhancement layer 

construction of the first enhancement layer encoder 203 can be a part of an image in the base layer. For example, if 

shown in FIG. 34; 65 a partictUarly important area in a base layer image is 

FIG. 36 is a flow chart Ulustrating the process performed improved in image quality, the resultant image will be good 

by the pixel replacement circuit 231 shown in FIG. 35; enough in low bit rate applications. This technique also 
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allows a reduction of redundant bits. In this case, the 
particular area of the base layer image can be improved in 
the image quality (spatial resolution or temporal resolution) 
by decoding both bit streams of the enhancement and base 
layers. 

In the case where the enhancement layer image corre- 
sponds to a particular area of the base layer image, the base 
layer image has information about the image outside the 
enhancement layer image. In the motion compensation, the 
above information can be used to achieve an improvement 
in the encoding efiBciently. 

That is, in the present invention, pixel replacement is 
performed as follows. 

1. When the enhancement layer is referred to in the 
motion vector extraction or in the motion compensation, 
pixels whose pixel value is equal to 0 are replaced with 
pixels at corresponding locations in the base layer. 

2. In the above pixel replacement process for those pixels 
whose pixel value is equal to 0, a flag is used to indicate 
whether the pixel values should be replaced with proper base 
layer image signals. This flag is transmitted after being 
encoded. 

3. When an image of a VOP having an arbitrary shape is 
expanded using an interpolation filter or the like, the expan- 
sion is performed after performing pixel replacement pro- 
cess. 

With this technique, when an image is encoded in a 
scalable fashion VO by VO, even if the size and/or the shape 
vary with time, it is possible to perform motion compensa- 
tion in a highly efGcient fashion at a reduced calculation 
cost. As a result, it is possible to realize a high efficiency 
scalability. 

A first embodiment of an image signal encoder according 
to the present invention will be described below. In this 
embodiment, the VOP encoders 103-0 to 103-n and the VOP 
decoders 112-0 to 112-n shown in FIGS. 51 and 52 are 
replaced by scalable encoders and scalable decoders, 
respectively, thereby achieving the bit stream scalability. 

FIG. 1 illustrates an example of a VOP encoder 103 
according to the first embodiment. An image signal and a 
key signal of each VOP as well as a flag FSZ indicating the 
VOP size and a flag FPOS indicating the absolute coordinate 
position of the VOP are input to a layered image signal 
generator 201. The layered image signal generator 201 
generates a plurality of image signals in different layers from 
the input image signal. For example in the case of the spatial 
scalabflity, the layered image signal generator 201 generates 
a base layer image signal and key signal by reducing the 
input image signal and key signal. Although in the specific 
example shown in FIG, 1, image signals in two layers (an 
enhancement layer image signal and a base layer image 
signal) are generated, image signals in a greater number of 
layers may also be generated. For simplicity, it is assumed 
in the fo flowing description that image signals in two layers 
are generated. 

In the case of the temporal scalabflity (scalabihty along 
the time axis), the layered image signal generator 201 
switches the output image signal between the base layer 
image and the enhancement layer image depending on the 
time. For example, as shown in FIG. 2, VOPO, V0P2, 
V0P4, and V0P6 are output in the base layer and VOPl, 
V0P3, and V0P5 are output in the enhancement layer. In the 
case of the temporal scalability, expansion/reduction of the 
image signal (conversion of resolution) is not performed. 

In the case of the SNR (signal-to-ooisc ratio) scalability, 
the layered image signal generator 201 supplies the input 
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image signal and key signal directly to the respective layers. 
That is, the same image signal and key signal are suppUed 
to both base layer and enhancement layer. 
In the case of the spatial scalabflity, the layered image 

5 signal generator 201 performs resolution conversion on the 
input image signal and key signal, and supplies the resultant 
image signal and key signal to the base layer. The resolution 
conversion is performed by means of reduction filtering 
process using for example a reduction filter. Alternatively, 

1^ after the Layered image signal generator 201 performs 
resolution conversion on the input image signal and key 
signal, the resultant image signal and key signal may be 
supplied to the enhancement layer. In this case, the resolu- 
tion conversion is performed by means of expansion filtering 

15 process. Stfll alternatively, two separately generated image 
signals and associated key signals (which may or may not be 
equal in resolution) may be output from the layered image 
signal generator 201 to the enhancement layer and the base 
layer, respectively. In this case, which images are supplied 

20 to which layers are determined in advance. 

The scalable encoding method performed VO by VO wfll 
be described below. The size and/or position of the VO may 
or may not vary with time. The scalability can be performed 
in either of the following modes. 

1, The enhancement layer includes the entire area of the 
base layer. 

2. The enhancement layer corresponds to a partial area of 
the base layer. 

30 In the case of mode 1, the entire area of the base layer is 
improved in the image quality by decoding the enhancement 
layer and the base layer. Herein, the improvement in the 
image quality refers to the improvement in the temporal 
resolution in the case of the temporal scalability. On the 

35 other hand, it refers to the improvement in the spatial 
resolution in the case of the spatial scalabflity. 

In the case of mode 2, only the corresponding partial area 
of the base layer is improved in the image quality by 
decoding the enhancement layer and the base layer. 

^ In both cases of modes 1 and 2, the VOP may have either 
a rectangular or arbitrary shape. FIG. 3 illustrates an 
example of spatial scalabflity in mode 1 for the case where 
the VOP has a rectangxilar shape. On the other hand, FIG. 4 
iflustrates an example of a spatial scalability in mode 2 for 
the case where the VOP has a rectangular shape. 

FIGS. 5 and 6 illustrate examples of spatial scalability in 
mode 1 for the case where the VOPs have an arbitrary shape. 
FIG. 7 iflustrates an example of spatial scalabflity in mode 
2 for the case where the VOP has an arbitrary shape. 

The mode of scalability is determined in advance, and the 
layered image signal generator 201 sets the enhancement 
and base layers in accordance with the predetermined mode. 
The layered image signal generator 201 also outputs flags 

55 indicating the sizes and absolute coordinate positions of 
VOPs in the respective layers. For example, in the case of 
the VOP encoder shown in FIG. 1, a flag FSZ~B indicating 
the size of the base layer VOP and a flag FPOS_B indicating 
the absolute coordinate position of the base layer VOP are 

60 output to the base layer encoder 204. On the oflier hand, a 
flag FSZ_E indicating the size of the enhancement layer 
VOP and a flag FPOSJ indicating the absolute coordinate 
position of the enhancement layer VOP are output to the 
enhancement layer encoder 203 via the delay circuit 202. 

65 Furthermore, the layered image signal generator 201 
outputs a flag FR indicating the ratio of the size of the 
enhancement layer VOP relative to the size of the base layer 



10/30/2002, EAST Version: 1.03.0002 



us 6,173, 

21 

VOP to the resolution converter 205 and the enhancement 
layer encoder 203 via the delay drcuit 202. 

Referring now to FIG. 8, the base layer encoder 204 will 
be described below. In HG. 8, similar elements to those in 
FIG. 44 are denoted by similar reference numerals. 5 

An input image signal is first supplied to a set of frame 
memories 1, and stored therein in the predetermined order. 
The set of frame memories 1 stores the image signal of the 
VOP, the flag FSZ„B indicating the size of the VOP, and the 

flag FPOS B indicating the absolute coordinate position of 

the VOP The set of frame memories 1 can store image 
signals and flags FSZ_„B and FPOS_B for a plurality of 
VOPs. The image signal to be encoded is supplied macrob- 
lock by macroblock to a motion vector extraction circuit 222 
and an arithmetic operation circuit 3. 

The motion vector extraction circuit 222 processes the 
image data for each frame as an I-picture, a P-picture, or a 
B-picture according to a predetermined procedure. In the 
above procedure, the processing mode is predefined for each 
frame of an image sequence, and each frame is processed as ^ 
either of an I-picture, a P-picture, or a B-picture correspond- 
ing to the predefined processing mode (for example frames 
are processes in the order of I, B, P, B, P, . . . , B, P). 
Basically, I-pictures are subjected to intraframe encoding, 
and P-pictures and B-pictures are subjected to interframe 
prediction encoding, although the encoding mode for 
P-pictures and B-pictures is adaptively varied macroblock 
by macroblock in accordance with the prediction mode as 
will be described later. 

The motion vector extraction circuit 222 extracts a motion 
vector with reference to a predetermined reference frame so 
as to perform motion compensation (interframe prediction). 
The motion compensation (interframe prediction) is per- 
formed in one of three modes: forward, backward, and 
forward-and-backward prediction modes. The prediction for 
a P-picture is performed only in the forward prediction 
mode, while the prediction for a B-picture is performed in 
one of the above -described three modes. The motion vector 
extraction circuit 222 selects a prediction mode which can 
lead to a minimum prediction error, and generates a pre- 
dicted vector in the selected prediction mode. 

The prediction error is compared for example with the 
dispersion of the given macroblock to be encoded. If the 
dispersion of the macroblock is smaller than the prediction 45 
error, prediction compensation encoding is not performed on 
that macroblock but, instead, intraframe encoding is per- 
formed. In this case, the prediction mode is referred to as an 
intraframe encoding mode. The motion vector extracted by 
the motion vector extraction circuit 222 and the information 5Q 
indicating the prediction mode employed are supplied to a 
variable-length encoder 6, a set of frame memories 11, and 
a motion compensation circuit 12. 

The motion vector will be described below. Since VOPs 
are different in size and position from one another, it is 55 
required to define a reference coordinate system in which 
detected motion vectors are represented. An absolute coor- 
dinate system is assumed to be defined herein, and motion 
vectors are calculated using this coordinate system. That is, 
after placing a current VOP and a predicted reference VOP 60 
at proper positions in accordance with the flags indicating 
their size and positions, a motion vector is calculated. The 
details of the method of detecting motion vectors will be 
described later. 

The motion compensation circuit 12 generates a predicted 65 
image signal on the basis of the motion vector, and supplies 
it to an arithmetic operation circuit 3. The arithmetic opera- 
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tion circuit 3 calculates the difference between the value of 
the given macroblock to be encoded and the value of the 
predicted image. The result is supplied as a difference image 
signal to a DCT circuit 4. In the case of an intramacroblock, 
the arithmetic operation circuit 3 directly transfers the value 
of the given macroblock to be encoded to the circuit 4 
without performing any operation. 

The DCT circuit 4 performs a DCT (discrete cosine 
transform) operation on the received image signal thereby 
converting it to DCT coefiEcients. The resultant DCT coef- 
ficients are supplied to a quantization circuit 5. The quan- 
tization circuit 5 quantizes the DCT coeflScients in accor- 
dance with a quantization step corresponding to the amount 
of data stored in a transmission buffer 7. The quantized data 
is then supplied to the variable-length encoder 6. 

The variable-length encoder 15 converts the quantized 
data supplied from the quantization circuit 5 into a variable- 
length code using for example the Huf&nan encoding 
technique, in accordance with the quantization step (scale) 
supplied from the quantization circuit 5. The obtained 
variable-length code is supplied to the transmission buffer 7. 

The variable-length encoder 6 also receives the quantiza- 
tion step (scale) from the quantization circuit 5 and the 
motion vector as well as the information indicating the 
prediction mode (that is, the information indicating in which 
mode of the intraframe prediction mode, the forward pre- 
diction mode, the backward prediction mode; or forward- 
and-backward prediction mode the prediction has been 
performed) from the motion vector extraction circuit 222, 
and converts these received data into variable-length codes. 

Furthermore, the variable-length encoder 6 receives the 
flag FSZ_B indicating the size of the base layer VOP and 
the flag FPOS_B indicating the position thereof in the 
absolute coordinates, and also encodes these flags. The 
variable-length encoder 6 interposes a key signal bit stream 
supplied from a key signal encoder 223 at a predetermined 
position in the data bit stream output from the quantization 
circuit 5. The resultant bit stream is supphed to the trans- 
mission buffer 7. 

The key signal associated with the base layer VOP to be 
encoded is input to the key signal encoder 223. The key 
signal is encoded according to a predetermined encoding 
method such as DPCM, and a resultant key signal bit stream 
is supplied to the variable-length encoder 6 and the key 
signal decoder 224. The key signal decoder 224 decodes the 
received key signal bit stream, and supplies the result to a 
motion vector extraction circuit 222, the motion compensa- 
tion circuit 12, the DCT circuit 4, the inverse DCT circuit 9, 
and an image replacement circuit 221. The decoded key 
signal is supplied to the base layer encoder 203 shown in 
HG. 1. 

The transmission buffer 7 stores the received data tem- 
porarily. The information representing the amount of data 
stored in the transmission buffer 7 is fed back to the 
quantization circuit 5. If the amount of residual data stored 
in the transmission buffer 7 reaches an upper allowable Umit, 
the transmission buffer 7 generates a quantization control 
signal to the quantization circuit 5 so that the following 
quantization operation is performed using an increased 
quantization scale thereby decreasing the amount of quan- 
tized data. Conversely, if the amount of residual data 
decreases to a lower allowable limit, the transmission buffer 
7 generates a quantization control signal to the quantization 
circuit 5 so that the following quantization operation is 
performed using a decreased quantization scale thereby 
increasing the amount of quantized data. In this way, an 
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overflow or underflow in the transmission buffer 7 is pre- divided into the following three types of line segments. In 

vented. The encoded data stored in the transmission buffer 7 the subsequent step SIO, it is judged which of these three 

is read out at a specified time and supplied to the multiplexer types the line segments are. 

206 shown in FIG, 1. 1. A line segment whose both ends ate located on either 

The quantized data output by the quantization circuit 5 is 5 end of the VOP 

also supplied to an invert quantization circuit 8. The ^ ^ ^^^^^ ^ ^ j ^ 

inverse quantization circuit 8 performs mverse quantization ■ i i 
on the received data in accordance with the quantization step 

given by the quantization circuit 5. The data (DCT 3. A line segment whose both ends are not equal to zero 

coefBcients) generated by the inverse quantization circuit 8 in pixel value. 

are supplied to an IDCT (inverse DCT) circuit 9 which in For such a line segment whose both ends are located on 

turn performs an inverse DCT operation on the received either end of the VOP, zero is substituted into the value B in 

data. The resultant value is then added by the arithmetic step Sll, In the case where the line segment has non-zero 

operation circuit 10 with a predicted image value for each pixel values at its both ends, the average of pixel values on 

block according to the prediction mode. The resultant image both ends is substituted into the value B in step S12. In the 

signal is then supplied to a pixel replacement circuit 221 for case where only one of ends of the Une segment has a 

use m generatmg a further predicted image. The image non-zero pixel value that non-zero pixel value is substituted 

signal IS subjected to a pixel replacement process in the pixel -^^^^ ^^^a^ g -^j ^^3 

replacement circuit 221, and the resultant image signal is _ , ... . 

stored in a set of frame memories 11. In the case of an . ^^^^^^ subjected to image processmg as shown 

intramacroblock, the macroblock output from the IDCT 20 l^B. 

circuit 9 is supplied to the set of frame memories 11 without 1° step S14, if both values B and C are qual to 0, the pixel 

being processed by the arithmetic operation circuit 10 and values are maintained at 0. On the other hand, if only the 

the pixel replacement circuit 221. value B is not qual to zero, the pixel values are replaced by 

Referring to the flow chart shown in FIG. 9, the process value B. In the case where only the value C is not equal 
performed by the pixel replacement circuit 221 is described. 25 to zero, the pixel values are replaced by the value C. If both 

If it is concluded in step SI that the position of a pixel to be values B and C are not equal to zero, the pixel values are 

processed is within an image object, that is, the con-espond- replaced by the average of these. 

ing key signal has a value which is not equal to zero, then Thus, the VOP is subjected to image processing as shown 

process goes to step S2 in which the pixel replacement in FIG. IOC. 

circuit 221 directly outputs the pixel without performing any 3° In step S15, after completion of the above processing 

process on it. On the other hand, if the corresponding key steps, it is judged whether pixel values are equal to zero or 

signal is equal to 0, then the process goes to step S3, and 0 not. The pixel values which are not equal to zero are directly 

is substituted into that pixel. output. Tliose pixels having a value of zero are subjected to 

In the case where the VOP has a rectangular shape, the replacement in step S16 such that each pixel value is 
key signal always has a value which is not equal to 0 (1 in replaced by a non-zero value of a pixel located nearest, in the 
the case of a binary key, 255 in the case of a gray scale key). horizontal or vertical direction, to the pixel under consider- 
Therefore, in this case, aU pixels of the VOP are directly ation. In this replacement, if there are two non-zero pixels at 
output without being subjected to any process. the nearest positions, the pixel value is replaced by the 

Subsequent to step S3, a series of steps from S4 to S8 and average of these two non-zero pixel values, 

a series of steps from S9 to S13 are performed in parallel. In Thus, the VOP is subjected to image processing as shown 

step S4, the VOP under consideration is scanned in the in FIG. lOD. 

horizontal direction. In this step, each horizontal line is After completion of the above replacement, the pixel 

divided into the following three types of line segments. Step replacement circuit 221 supplies the resultant image signal 

S5 judges which of these three types the line segments are. to the set of frame memories 11 and the resolution converter 

1. A line segment whose both ends are located on either 205 shown in FIG. 1. 

end of the VOP. Ttig set of frame memories 1 stores the image signal 

2. A line segment whose one end is not equal to zero in output from the pixel replacement circuit 221, the flag 
pixel value. FSZ_B indicating the size of the VOP, and the flag 

3. A line segment whose both ends are not equal to zero FP0S_B indicating the absolute coordinate position of the 
in pixel value. VOP. The set of frame memories 11 also supplies a locally 

For such a line segment whose both ends are located on decoded image signal of the VOP to the enhancement layer 

either end of the VOP (for example a line segment in a space encoder 203 via the resolution converter 205. 

in FIG. lOA), zero is substituted into the value C in step S6. Now, the motion vector extraction circuit 222 is 
In the case where the line segment has non-zero pixel values 55 described. The motion vector extraction circuit 222 extracts 

at its both ends, (for example a line segment located within a motion vector which results in a minimum prediction error 

a solid area in FIG. lOA and thus both ends of the line for a macroblock to be encoded relative to a reference image 

segment have a pixel value corresponding to black), the signal which is supplied from the set of frame memories 11 

average of pixel values on both ends is substituted into the depending on the prediction mode (I-, P-, B-pictures). 
value C in step S7. In the case where only one of ends of the go The motion vector extraction circuit 222 also receives a 

line segment has a non-zero pixel value (for example a locally decoded key signal associated with the macroblock 

horizontalline represented in FIG. lOA), that non-zero pixel being processed from the key signal decoder 224. The 

value is substituted into the value C in step S8. motion vector extraction circuit 222 calculates a prediction 

Thus, image processing is performed as shown in FIG, error by referring to the corresponding key signal. 
1^^' 65 When the VOP to be encoded has an arbitrary shape, the 

Then in step S9, the VOP under consideration is scanned macroblock to be encoded can have an area in which there 

in the vertical direction.. In this step, each vertical line is is no image. In this case, those pixels in the non-image area 
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in the macroblock to be encoded are neglected in the P-pictures and B-pictures are subjected to interframe pre- 
calculation of the prediction error. That is, the prediction diction encoding, although the encoding mode for P-pictures 
error of the naacroblock being processed is calculated using and B-pictures is adaptively varied macroblock by macrob- 
only pixels in areas in which there is an image, and the jo^k in accordance with the prediction mode as will be 
motion vector is determined so that it gives a minimum 5 ^eg^ribed later 
prediction error. 

It is possible to judge whether each pixel of the macrob- In the case of the spatially scalable encoding, the encod- 

lock to be encoded corresponds to an image or not by ing is performed for example as shown in FIG. 12. A first 

referring to the locally decoded key signal associated with VOP in the enhancement layer is encoded as a P-picture. In 
the macroblock to be encoded. That is, if the key signal lO t^is case, a VOP in the base layer which is equal in time to 

corresponding to a pixel is equal to 0, then there is no unage the first VOP in the enhancement layer is employed as a 

corresponding to that pixel. This means that such the pixel reference image. A second and subsequent VOPs in the 

is m an area outside an image object. On the other hand, if , , i j j t> - ^ t j 

^, . . , J- . . 1 ^1 enhancement layer arc encoded as a B-picturc. In the cncod- 

the key signal corresponding to a pixel has a value not equal - , , , . . , . , .... 

n •« ,u- u • 1^ mg 01 these VOPs, the image m the base layer which IS equal 

to 0, then that pixel is m an area m which there is an image is =» > & j m 

or an image object ^° '° immediately preceding VOP in the enhance - 

When the motion vector extraction circuit 222 refers to "^^^^ ^^y^' employed as a reference image. As in the case 

the key signal supphed from the key signal decoder 224, if P-pictures m the base layer, B-pictures m the enhancc- 

the key signal is equal to 0, then the difference between the "^^nt layer are employed as prediction reference images 

value of the pixel corresponding to the key signal and the when other VOPs are encoded. 

reference image signal is not included in the calculation of cxm i um * • -i r .i. i 

the prediction error. When the VOP has a rectangular shape, T^!-, '^^}^^^'y » ^P'"^^ °^ the spatial 

the key signal always has a value not equal to 0(1 in the case scalabihty in which the enhancement layer and the base 

of a binary key, 255 in the case of a gray scale key), and thus '^y^"" ^l^'^^ "> ^Jze to each other. 

aU pixels of the macroblock are taken in the calculation of 25 temporal scalabiUty, encoding is performed for 

the predicuon error. example as shown in FIG. 2. A VOPl is encoded as a 

Referring again to FIG. 1, the resolution converter 205 q ^^j^^^ ^^^^^j^ ^ ^OPO and V0P2 in the base layer are 

converts the resolution of the base layer image signal to the , j- r • a t/^atii • jj 

, ^. J- * v ... 1 used as prediction rererence images. A V0P3 is encoded as 

resolution corresponding to the enhancement image signal • . . ,,^t^ . . . , . , 

by means of filtering operation in accordance with the flag " B-picture whercm a VOP2 in the base layer immediately 

FR indicating the ratio of the size of the enhancement layer precedmg the V0P3 and also a V0P4 in the base layer 

VOP to the size of the base layer VOP, and supplies the immediately after the V0P3 are employed as reference 

result to the enhancement layer encoder 203. When the images.Similarly,a VOPS is encoded as a B-picture wherein 

magnification (size ratio) is equal to 1, that is, when the a V0P4 in the base layer immediately preceding the VOPS 

enhancement layer and the base layer are equal in the size, and also a V0P6 in the base layer immediately after the 

the resolution converter 205 directly outputs the received VOPS are employed as reference images, 
data without performing any process on it. 

The enhancement layer image signal, the key signal, the P^^^^^ predicting P- and B-pictures in the 

flag FSZ_iE indicating the size of the enhancement layer enhancement layer is descnbed below. In the prediction in 

VOP, and the flag FPOS_E indicating the absolute coordi- the enhancement layer, not only an image in the same layer 

nate position of the enhancement layer VOP, which are but also an image in other layers (scalable layers) may be 

generated by the layered image signal generator 201, are employed as a reference image. For example in the case of 

supphed to the enhancement layer encoder 203 via a delay a two-layer scalability, prediction of images in a higher layer 

circuit 202. The delay circuit 202 makes the input signals (enhancement layer) may be performed using images in a 

delayed by a time required for the base layer encoder to i^^^, i,y^^ (^ase layer). For each scalable layer, a flag 

encode the correspondmg base layer VOP ref_layer_id is set to indicate which layer other than the 

Referring now to FIG U the enhancemem layer encoder ^ ^ ^ ^^^^^^^^ ^ ^ 

203 is descnbed. In FIG. 11, similar elements to those m r ^ ■ . . , , j ..u 

mr- A A -I * J u ■ -1 f 1 rei„layer_id is encoded and transmitted Furthermore, a 

FIG. 44 are denoted by similar reference numerals. ^ r . , . . ^ .... 

.... • , • 1- J . . r r li^ig rctL_sclect_codc is set to mdicatc from which layer 

An input image signal is supphed to a set of frame r j j-.- t - r ^ 

memories 1, and stored therein in the predetermined order. ^^^'^^ prediction and backward prediction are performed 

The set of frame memories 1 stores the image signal of the accordance with the flag ref_layer_id, and this flag 

VOP, the flag FSZ_E indicating the size of the VOP, and the ref_select_code is also encoded and transmitted. Table 1 
flag FPOS_E indicating the absolute coordinate position of 55 shows a flag ref_select_code for a P-picture, and Table 2 

the VOP. shows a flag ref_jselect_code for a B-picture. The syntax 

The image signal to be encoded is input macroblock by associated with these flags will be described later, 
macroblock to a motion vector extraction circuit 232, The 

motion vector extraction circuit 232 processes the image TABLE 1 
data for each frame as an I-picture, a P-picture, or a 60 
B-picture according to a predetermined procedure. In this 
procedure, the processing mode is predefined for each frame 
of the image sequence, and each frame is processed as an 
I-picture, a P-picture, or a B-picture corresponding to the 
predefined processing mode (for example frames are pro- 6S 
cesses in the order of I, B, P, B, P, . . . , B, P). Basically, 
I-pictures are subjected to intraframe encoding, and 



rcf__s€lecL_ccxle 


forward prediction reference 


00 


finally decoded VOP in the same layer 


01 


finally displayed VOP in the reference layer 


10 


VOP in the reference layer to be displayed next 


11 


VOP in the reference layer equal in time 




(motioa vector us not transmitted) 
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TABLE 2 





forward temporal 


backward temporal 


rcf_sclect_oode 


reference 


reference 


00 


finally decoded 


VOP in the reference layer equal 




VOP in the same 


in time (motion vector is not 




layer 


transmitted) 


01 


finally decode VOP 


finally displayed VOP in the 




in the same layer 


reference layer 


ID 


finally decoded 


VOP in the reference layer to be 




VOP in the same 


displayed next 




layer 




11 


finally displayed 


VOP in the reference layer to be 




VOP in the 


displayed next 




reference layer 





The method of prediction in the enhancement and base 
layers is not limited to those shown in FIGS. 2 and 12, but 
the prediction may be performed in various manners as long 
as the requirements shown in Tables 1 and 2 are satisfied. In 
the syntax shown in Tables 1 and 2, there is no explicit 
designation about the spatial or temporal scalability. 

In the case of a P-picture, when ref__select_code is 
VOP equal in time in the layer (reference layer) indicated by 
ref_lay6r_id is employed as a prediction reference image. 
This mode is also used in the spatial scalability and the SNR 
scalability. The other modes '00', '01', and '10* are used in 
the temporal scalability. 

In the case of a B-picture, when ref_sel6ct_code is '00*, 
a VOP equal in time in the layer indicated by reLJaycr_id 
and an immediately preceding decoded VOP in the same 
layer are used as prediction reference images. This mode is 
used in the spatial scalability and the SNR scalability. The 
other modes 'OV, *10\ and *11' are used in the temporal 
scalability. 

Which of I-, P-, and B-picture is employed in the process 
of encoding each VOP in each layer is determined in 
advance. The motion vector extraction circuit 232 sets the 
flags ref_layer id and ref__select code according to the 
predefined picture type, and supplies these flags to the 
motion compensation circuit 12 and the variable-length 
encoder 6. 

The decoded image signal and key signal in the base layer 
are supplied to the enhancement layer encoder 203 via the 
resolution converter 205, and stored in the set of frame 
memories 235 therein. The decoded image signal supplied 
herein to the resolution converter 205 has been subjected to 
the pixel replacement process in the pixel replacement 
circuit 221 shown in FIG. 8. 

The flag FSZ_B indicating the size of the base layer VOP 
and the flag FPOS_B indicating the absolute coordinate 
position thereof are stored in the set of frame memories 235 
and also supplied to the motion vector extraction circuit 232 
and the motion compensation circuit 12. 

The motion vector extraction circuit 232 refers to a 
predetermined proper reference &ame stored in the set of 
frame memories 1 or 235, and performs motion compensa- 
tion (interframe prediction) thereby extracting a motion 
vector. The motion compensation (interframe prediction) is 
performed in one of three modes: forward, backward, and 
forward-and-backward prediction modes. The prediction for 
a P-picture is performed only in the forward prediction 
mode, while the prediction for a B-picture is performed in 
one of the above -described three modes. The motion vector 
extraction circuit 232 selects a prediction mode which leads 
to a minimum prediction error, and outputs a motion vector 
and the prediction mode. 
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The prediction error is compared for example with the 
dispersion of the given macroblock to be encoded. If the 
dispersion of the macroblock is smaller than the prediction 
error, prediction compensation encoding is not performed on 
5 that macroblock but, instead, intraframe encoding is per- 
formed. In this case, the prediction mode is referred to as the 
intraframe encoding mode. The motion vector and the 
information about the prediction mode are supplied to the 
variable-length encoder 6 and the motion compensation 
circuit 12. 

The motion vector extraction circtul 232 in the enhance- 
ment layer encoder 203, as in the case of the motion vector 
extraction circuit 222 in the base layer, receives the locally 
decoded key signal associated with the macroblock to be 

^5 encoded wherein the key signal is locally decoded by the key 
signal decoder 234. In this case, the key signal decoder 234 
outputs a decoded enhancement layer key signal. In the 
calculation of the prediction error, the motion vector extrac- 
tion circiut 232, as in the case of the motion vector extraction 

20 circuit in the base layer, neglects the difference values 
between pixel values of the predicted image and pixel values 
whose associated key signal is equal to 0. That is, the 
prediction error is calculated using only those pixels which 
are located within an image object and whose associated key 

25 signal has a value not equal to 0, and a motion vector which 
gives a minimum prediction error is detected. 

The motion vector extraction circuit 232 also receives a 
flag FR indicating the ratio of the size (resolution) of the 
enhancement layer to the size of the base layer. As can be 

30 seen from Table, 2, in the case of a B-picture (VOP), if 
ref__selecct_code=*00', the encoding is performed in the 
spatially scalable mode. In this case, backward prediction is 
performed by referring to a VOP equal in time in the base 
layer (reference layer), and forward prediction is performed 

35 by referring to an immediately preceding decoded VOP in 
the same layer. If the magnification flag is equal to 1 (the 
base layer and the enhancement layer are equal in resolution 
to each other), and if ref_select_code='00*, the encoding is 
perfonned in the SNR scalable mode which is a special case 

40 of the spatial scalability. In this case, backward prediction in 
the enhancement layer is performed using a motion vector 
used in the prediction of a VOP equal in time in the base 
layer in the same prediction mode. Therefore, in this case, 
the motion vector extraction circuit 232 directly supplies to 

45 the motion compensation circuit 12 the motion vector and 
the information about the prediction mode supplied from the 
base layer. Thus, in this case, the variable-length encoder 6 
does not perform encoding on the motion vector. 

According to the motion vector, the motion compensation 

50 circuit 12 generates a predicted image signal from the image 
signal stored in the set of frame memories 11 or 235, and 
supplies the resultant signal to the arithmetic operation 
circuit 3. The arithmetic operation circuit 3 calculates a 
difference between the value of the macroblock to be 

55 encoded and the value of the predicted image signal, and 
supplies the resultant difference image signal to the DCT 
circuit 4. In the case of an intramacroblock, the arithmetic 
operation circuit 3 directly transfers the value of the given 
macroblock to be encoded to the circuit 4 without perform- 

60 ing any operation. 

The DCT circttit 4 performs a DCT (discrete cosine 
transform) operation on the received image signal thereby 
converting it to DCT coeflacients. The DCT coefficients are 
input to the quantization circuit 5 and quantized according to 

65 a quantization step corresponding to the amount of data 
stored in the transmission buffer 7. The resultant quantized 
data is supplied to the variable -length encoder 6. 
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The variable-length encoder 6 converts the quantized data ment circuit 231 for use in generating a further predicted 
supplied from the quantization circuit 5 into a variable- image. The image signal is subjected to a pixel replacement 
length code using for example the Huffman encoding process in the pixel replacement circuit 231, and the result- 
technique, in accordance with the quantization step (scale) ant image signal is stored in the set of frame memories 11. 
supplied from the quantization circuit 5. The obtained 5 In the case of an intramacroblock, the arithmetic operation 
variable-length code is supplied to the transmission buffer 7. circuit 10 directly transfers the macroblock output by the 

The variable-length encoder 6 also receives the quantiza- II^CT circuit 9 to the pixel replacement circuit 231 without 

tion step (scale) from the quantization circuit 5 and the perfonning any operation. 

motion vector as well as the information indicating the The pixel replacement circuit 231 of the enhancement 
prediction mode (that is, the information indicating in which 1° layer encoder 203 is described in further detail below. As 

mode of the intraframe prediction mode, the forward pre- described earlier with reference to FIG. 9, the pixel replacc- 

diction mode, the backward prediction mode, or forward- ment circuit 221 in the base layer replaces the values of 

and-backward prediction mode the prediction has been pixels in a non-image area outside an image object with the 

performed) from the motion vector extraction circuit 232, values of pixels located at periphery of an area in which 
and converts these received data into variable-length codes. 15 there is an image. 

The variable-length encoder also encodes the flag FSZ_E In contrast, the pixel replacement circuit 231 in the 

indicating the size of the enhancement layer VOP, the flag enhancement layer performs not only a similar replacement 

FPOS_E indicating the absolute coordinate position process to that performed by the pixel replacement circuit 

thereof, and the flag FR indicating the ratio of the resolution 221 in the base layer but also a pixel replacement process 

of the enhancement layer to the resolution of the base layer. using a decoded base layer reference image output from the 

The variable-length encoder 6 interposes a key signal bit set of frame memories 235. 

stream at a predetermined position in the encoded image xhe scalable encoding method performed VO by VO will 

signal bit stream, and supplies the resultant bit stream to the be described below. The size and/or position of the VO may 

transmission buffer 7, vary with time or may be constant. The scalability can be 

The key signal of the enhancement layer VOP to be performed in either of the following modes, 

encoded is input to the key signal encoder 233. The key xhat is, as described earlier, the scalable encoding for 

signal IS encoded according to a predetermmed encoding each VO is performed in different ways depending on the 

method such as DPCM, and the resuhant key signal bit type of VO as described below, 

stream is supplied to the variable-length encoder 6 and the i ^n, u * i • i j *l c 
1 - 1 J J navi T-u 1 - fu * . J J J 30 1. The enhancement layer mcludes the entire area of the 

key signal decoder 234. The key signal bit stream is decoded ^ 

by the key signal decoder :234, and the resultant signal is 

supplied to the motion vector extraction circuit 232 and the ^- enhancement layer corresponds to a partial area of 

motion compensation circuit 12. layGL 

The transmission buffer 7 stores the received data tem- 35 ^XP^ 2, the base layer has information about an area 
porarily. The information representing the amount of data ^^^^h is not included in the enhancement layer. In particular, 
stored in the transmission buffer 7 is fed back to the in the spatial scalability, the enhancement layer and the base 
quantization circuit 5. If the amount of residual data stored ^^^^^ ^® ^^^^ ^ ^^^^ ^^^^ possible to use a base 
in the transmission buffer 7 reaches an upper allowable limit, ^^y^^ reference image converted in resolution, 
the transmission buffer 7 generates a quantization control FIG. 13 illustrates an example of pixel replacement pro- 
signal to the quantization circuit 5 so that the following cess performed by the pixel replacement circuit 231 in the 
quantization operation is performed using an increased enhancement layer. In an area containing an image in which 
quantization scale thereby decreasing the amount of quan- corresponding key signals have a value not equal to 0 (for 
tized data. Conversely, if the amount of residual data example the image object area in FIG. 13), the image in the 
decreases to a lower allowable limit, the transmission buffer 45 enhancement layer is directly employed. In the other area 
7 generates a quantization control signal to the quantization (the area in which horizontal lines are drawn in FIG. 13), the 
circuit 5 so that the following quantization operation is reference image is obtained by replacing pixel values in the 
performed using a decreased quantization scale thereby enhancement layer with pixel values of the base layer image 
increasing the amount of quantized data. In this way, an which has been converted in resolution (subjected to an 
overflow or underflow in the transmission buffer 7 is pre- up-sampling process) at locations corresponding to the loca- 
vented. lions of the reference image (UVOPO in FIG. 13), 

The data stored in the transmission buffer 7 is read out at FIG. 14 is a flow chart illustrating the process performed 

a specified time and multiplexed by the multiplexer 206 by the pixel replacement circuit 231. If it is concluded in step 

shown in FIG. 1 with the base layer bit stream. The S21 that the pixel is within an image object, that is, the 
muhiplexed signal is then supphed to the multiplexer 104 55 corresponding key signal has a value not equal to zero, then 

shown in FIG. 51. the process goes to step S22 in which the image replacement 

The quantized data output from the quantization circuit 5 circuit 231 direcQy outputs the pixel value without perform- 

is input to the inverse quantization circuit 8 and subjected to ing any process on that pixel. When the corresponding key 

an inverse quantization process in accordance with the signal is equal to 0, the process goes to step S23, and 0 is 
quantization step supplied from the quantization circuit 5. 60 substituted into that pixel. In the case where the VOP has a 

The data (dequantized DCT coefficients) output from the rectangular shape, the key signal always has a value which 

inverse quantization circuit 8 is input to the I DCT (inverse is not equal to 0 (1 in the case of a binary key, 255 in the case 

DCT) circuit 9 and is subjected to an inverse DCT process of a gray scale key). Therefore, in this case, aU pixels of the 

therein. The resultant value is then added together by the VOP are directly output without being subjected to any 
arithmetic operation circuit 10 with a predicted image value 65 process. 

for each block according to the prediction mode. The The replacement mode is then determined in step S24, a 

resultant image signal is then supplied to the pixel replace- replacement process is performed according to that replace- 
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ment mode. The replacement mode is described in further layer VOP has a rectangular shape, the enhancement layer 

detail below. The replacement in the enhancement layer is corresponds to a partial area of the base layer (step S42). 

performed in either of two modes. In a first mode, the When the size of the enhancement layer VOP is compared 

replacement is performed in the same manner as in the with the size of the corresponding base layer VOP times the 
replacement perfonned in the base layer. In the other mode, 5 factor of FR, if the size of the enhancement layer VOP is 

pixel values in the enhancement layer are replaced with smaller, then the enhancement layer corresponds only to the 

values of pixels of a reference image in the base layer at partial area of the base layer (step S43). 

corresponding locations. The latter mode is employed when if it is concluded in steps S41 to S43 that the encoding is 

the enhancement layer corresponds to a partial area of te performed in the spatially scalable fashion and that the 
base layer, and the encoding is performed in a spatially lo enhancement layer corresponds only to a partial area of the 

scalable manner. The scalability mode and the replacement base layer, the pixel values of the reference image are 

mode are both determined in advance. A one-bit flag fill_ replaced with the pixel values of the base layer image which 

mode indicating the replacement mode is supplied from the has been converted in resolution (step S44), In the other 

pixel replacement circuit 231 to the variable-length encoder cases, the pixel replacement is performed in the same 
6. The flag fill_mode is encoded by the variable-length 15 manner as in the case where the pixel replacement circuit 

encoder 6 and transmitted. 221 performs replacement on a base layer image. 

If it is concluded that the flag fill_mode indicating the The set of frame memories 11 stores the image signal 

replacement mode has a value equal to 0, then the process output from the pixel replacement circuit 231, the flag 

goes to step S25 in which the pixel replacement circuit 231 FSZ_E indicating the size of the VOP, and the flag FPOS_E 
performs replacement in the same manner as that (FIG. 9) 20 indicating the absolute coordinate position thereof, 

performed by the pixel replacement circuit 221 (FIG. 8) in As described above, the bit streams generated by the 

the base layer. Th^ resultant image signal is output to the set enhancement layer encoder 203 and the base layer encoder 

of frame memones 11. 204, respectively, are input to the multiplexer 206 as shown 

In the case where the flag fill-mode indicating the replace- in FIG. 1. The multiplexer 206 shown in FIG. 1 multiplexes 

ment mode has a value equal to 1, the process goes to step the enhancement layer bit stream and the base layer bit 

S26 in which the pixel replacement circuit 231 replaces the stream into a single bit stream, and supplies the resultant VO 

pixel values in the enhancement layer with the pixel value of bit stream to the multiplexer 104 shown in FIG. 51. The 

the base layer reference image signal at coaesponding multiplexer 104 shown in FIG. 51 multiplexes the bit 

locations. This replacement method is described in fiirther streams supplied from the respective VOP encoders into a 

detail below with reference to FIGS. 15 and 16, signal bit stream, and outputs the resultant bit stream either 

In an example shown in FIG. 15, as in the example shown over a transmission line or onto a recording medium, 

in FIG. 13, when an image VOPl in the enhancement layer FIG. 18 illustrates an example of a VOP decoder 112 

is encoded, an immediately preceding image VOPO in the corresponding to the VOP encoder 1 f03 shown in FIG. 1 

enhancement layer and a base layer image UVOPl equal in according to the first embodiment of the invention. In FIG. 

time which has been converted in resolution (expanded in 18, the bit stream supplied to the VOP decoder via the 

size or up-sampled) are used as reference images. In this transmission line or the recording medium is first demulti- 

case, the pixel replacement circuit 231 replaces the pixel plexed to an enhancement layer bit stream and a base layer 

values in an area other than an image object in the image bit stream. 

VOPO with the values of pixels, at corresponding locations. The base layer bit stream is directly supplied to a base 

of the image UVOPO in the base layer equal in time which layer decoder 254. On the other hand, the enhancement layer 

has been converted in resolution (expanded in size or bit stream is supplied to an enhancement layer decoder 253 

up-sampled). via a delay circuit 252. 

In the replacement method shown in FIG. 16, which is a The delay circuit 252 makes the enhancement layer bit 

modification of the method described above, the pixel values stream delayed by a time required for the base layer decoder 

in an area other than an image object in the image VOPO are 254 to decode one VOP, and then outputs the bit stream to 

replaced with the values of pixels at corresponding locations the enhancement layer decoder 253. 

of the image UVOPl which is equal in time to the image a specific circuit configuration of the base layer decoder 

VOPl and which has been converted in resolution. 254 is described below with reference to FIG. 19. In FIG, 19, 

After completion of the replacement process, the pixel similar elements to those in FIG. 45 are denoted by similar 

replacement circuit 231 outputs the resultant image signal to reference numerals. 

the set of frame memories 11. After the base layer bit stream is stored temporarily in a 

Although in the method shown in FIG. 14, the replace- reception buffer 21, the base layer bit stream is supplied to 

ment mode is switched in accordance with the flag filL a variable-length decoder 22. The variable-length decode 22 
mode indicating the replacement mode, the replacement 55 performs variable-length decoding on the base layer bit 

mode may be switched in accordance with the flag ref_ stream supplied from the reception buffer 21 thereby sup- 

select_code. In this case, replacement is performed as plying a motion vector and information representing the 

described below with reference to FIG. 17. prediction mode to a motion compensation circuit 27, infor- 

As shown in Table 1, when an enhancement layer VOP is mation representing the quantized step to an inverse quan- 
to be encoded in the P-picture prediction mode, if the flag 60 tization circuit 23, and the variable-length decoded data to 

ref_selecl__code is equal to the encoding is performed the inverse quantization circuit 23. 

in the spatially scalable fashion. On the other hand, as shown The variable-length decoder 22 also decodes the flag 

in Table 2, when an enhancement layer VOP is to be encoded FSZ_B indicating the size of the VOP and the flag FPOS__B 

in the B-picture prediction mode, if the flag ref_select_ indicating the absolute coordinate position thereof, and 
code is equal to *00*, the encoding is performed in the 65 supplies the decoded flags to the motion compensation 

spatially scalable fashion (step S41). In the case where the circuit 26, a set of frame memories 26, and a key signal 

enhancement layer VOP has an arbitrary shape, and the base decoder 262. The flags FSZ_B and FPOS__B are also 
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supplied to an enhancement layer decoder 253. The variable- 
length decoder 22 also extracts a key signal bit stream, and 
supplies the extracted key signal bit stream to the key signal 
decoder 262. 

The key signal decoder 262 decodes the key signal bit 
stream supplied from the variable-length decoder 22 in 
accordance with a decoding method corresponding to the 
encoding method employed. The decoded key signal is 
supplied to an IDCT circuit 24, the motion compensation 
circuit 27, a pixel replacement circuit 261. The decoded key 
signal is also supplied to the enhancement layer decoder 253 
via a resolution converter 255 shown in FIG. 18. 

The inverse quantization circuit 23 performs inverse 
quantization on the quantized data supplied from the 
variable-length decoder 22 bock by block in accordance 
with the quantization step supplied from also the variable - 
length decoder 22. The resultant signal is supplied to the 
IDCT circuit 24. The IDCT circuit 24 performs an inverse 
DCT process on the data (DCT coefl&cients) output by the 
inverse quantization circuit 23, and supplies the resuUant 
data to an arithmetic operation circuit 25. 

In the case where the image signal supphed from the 
IDCT circuit 24 is I-picture data, the image signal is directly 
output via the arithmetic operation circuit 25 without being 
subjected to any process, and is stored in the set of frame 
memories 26 via the pixel replacement circuit 261 for use in 
generating a predicted image signal of an image signal 
which will be input later to the arithmetic operation circuit 
25. The image signal output from the arithmetic operation 
circuit 25 is directly output to an image reconstruction 
circuit 113 shown in FIG. 52. 

When the image signal supplied from the IDCT circuit 24 
is a P-picture or a B-picture, the motion compensation 
circuit 27 generates a predicted image signal in accordance 
with the motion vector and information representing the 
prediction mode supplied from the variable -length decoder 
22, and outputs the resultant signal to the arithmetic opera- 
tion circuit 25. The arithmetic operation circuit 25 adds the 
predicted image signal supplied from the motion compen- 
sation circuit 27 to the image signal supphed from the IDCT 
circuit 24 thereby creating a reproduced image signal. When 
the image signal supplied from the IDCT circuit 24 is a 
P-picture, the image signal output from the arithmetic opera- 
tion circuit 25 is also stored in the set of frame memories 26 
via the pixel replacement circuit 261 so that it can be used 
as a reference image in the process of decoding a subsequent 
image signal. However, in the case of an intramacroblock, 
the arithmetic operation circuit 25 simply transfers the 
image signal supplied from the IDCT circuit 24 to the output 
without performing any process on it. 

The pixel replacement circuit 261 performs pixel repay- 
ment in a similar manner to the pixel replacement circuit 221 
(FIG. 8) in the encoder (as shown in the flow chart of FIG. 
9). 

In FIG. 18, the base layer image signal and key signal 
decoded by the base layer decoder 254 are supplied to the 
image reconstruction circuit 113 shown in FIG. 52. The 
decoded base layer image signal and key signal are also 
supplied to the resolution converter 255, 

On the other hand, the flag PSZ_B indicating the size of 
the base layer image VOP and the flag FPOS-B indicating 
the absolute coordinate position thereof decoded by the base 
layer decoder 254 are supphed to the image reconstruction 
circuit 113 shown in FIG. 52 and also to the enhancement 
layer decoder 253. 

The enhancement layer bit stream created by the demul- 
tiplexer 251 via the demultiplexing process is supphed to the 
enhancement layer decoder 253 via the delay circuit 252. 
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The enhancement layer decoder 253 is described in fur- 
ther detail below with reference to FIG. 20. In FIG. 20, 
similar elements to those in FIG. 45 are denoted by similar 
reference numerals. 

After the enhancement layer bit stream is stored tempo- 
rarily in a reception buffer 21, the enhancement layer bit 
stream is supplied to a variable-length decoder 22. The 
variable-length decoder 22 performs an inverse variable- 
length encoding operation on the enhancement layer bit 
10 stream supplied from the receiving buffer 21. The variable- 
length decoder 22 outputs a motion vector and information 
indicating the associated prediction mode to a motion com- 
pensation circuit 27. The variable-length decoder 22 also 
supplies a quantization step to an inverse quantization circuit 
15 23. Furthermore, the variable -length decoded data is sup- 
phed from the variable-length decoder 22 to the inverse 
quantization circuit 23. 

The variable-length decoder 22 also decodes the flag 
FSZ_E indicating the size of the VOP and the flag FPOS„E 
indicating the absolute coordinate position thereof, and 
supphes the decoded flags to the motion compensation 
circuh 27, a set of frame memories 26, and a key signal 
decoder 274. 

Furthermore, the variable-length decoder 22 also decodes 
the flag FR indicating the ratio of the size (resolution) of the 
enhancement layer image VOP to the size (resolution) of the 
base layer image VOP, and supplies the result to the motion 
compensation circuit 27 and the resolution converter 255 
shown in FIG. 18. 

30 

According to the flag FR indicating the size ratio 
(magnification), the resolution converter 255 performs reso- 
lution conversion on the decoded base layer image signal 
and its associated key signal using a filter. The resultant 

^5 signals are supplied to a set of frame memories 273 in the 
enhancement layer decoder 253. 

The variable -length decoder 22 also decodes the flag 
rcf_layer_id indicating the reference layer used in 
prediction, and the flag ref__select_code, and suppUes the 

40 result to the motion compensation circuit 27. Still 
furthermore, the variable-length decoder 22 also decodes the 
flag fill-mode indicating the replacement mode and supplies 
the result to a pixel replacement circuit 271. The variable- 
length decoder 22 also extracts the key signal bit stream and 

45 supplies the extracted key signal bit stream to the key signal 
decoder 274. 

The key signal decoder 274 decodes the key signal bit 
stream supplied from the variable-length decoder 22 in 
accordance with a decoding method corresponding to the 

50 encoding method employed. The decoded key signal is 
supphed to an IDCT circuit 24, the motion compensation 
circuit 27, and the pixel replacement circuit 271. 

The inverse quantization circuit 23 performs inverse 
quantization on the data (quantized DCT coefficients) sup- 

55 phed from the variable-length decoder 22 bock by block in 
accordance with the quantization step supplied from also the 
variable-length decoder 22. The resultant signal is supphed 
to the IDCT circuit 24. The IDCT circuit 24 performs an 
inverse DCT process on the data (DCT coeflScients) output 

60 from the inverse quantization circuit 23, and supplies the 
resultant data to an arithmetic operation circuit 25. 

In the case where the image signal supphed from the 
IDCT circuit 24 is I-picture data, the image signal is direcdy 
output via the arithmetic operation circuit 25 without being 

65 subjected to any process, and is stored in the set of frame 
memories 26 via the pixel replacement circuit 261 for use in 
generating a predicted image signal of an image signal 
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which will be input later to the arithmetic operation circuit the corresponding VOL includes the entire image or an 

25. The image signal output from the arithmetic operation object which is a part of the image. A flag video_object_ 

circuit 25 is directly output to the image reconstruction Iayer_shape indicates the shape of the corresponding VOL, 

circuit 113 shown in FIG. 52. wherein specific shapes indicated thereby are shown in Table 

When the image signal supplied from the IDCT circuit 24 5 3. 
is a P-picturc or a B -picture, the motion compensation 

circuit 27 generates a predicted image signal from the image TABLE 3 
stored in the set of frame memories 26 or 273 in accordance 
with the motion vector, information representing the predic- 
tion mode, and flags ref__layer__id and ref_select_code 
indicating the reference layer supplied from the variable - 
length decoder 22. The resultant signal is supplied to the 
arithmetic operation circuit 25. The arithmetic operation 

circuit 25 adds the predicted image signal supplied from the a one-bit flag scalabiUty indicates whether the corre- 

motion compensation circuit 27 to the image signal suppUed 15 spending VOL is an enhancement layer or a base layer If 

from the IDCT circuit 24 thereby creating a reproduced scalability=l, the VOL is a base layer. In the other cases, the 

image signal. When the image signal suppHed from the vOL is an enhancement layer. 

IDCT circuit 24 is a P-picUire, the image signal output from ^ flag ref_layers_id indicates the VOL number used as 

the arithmetic operation circuit 25 is also stored m the set of ^ ^^^^^^^^ ^ ^^^^^^^ ^^^^ VOL being processed, 

frame memories 26 via the pixel replacement circuit 271 so 20 transmitted only to the enhancement layer, 

that It can be used as a reference image m the process of t-i u i • r . 

decoding a subsequent image signal. However, in die case of , VT?,^"'^ r V". ' ''°[-r^"'P^'°g- 

an intramacroblock. the arithmetic operation circuit 25 sim- . ? I , TJ" °^ 

ply transfers the image signal suppUed from the IDCT circuit «°b<!°«=°^e!" '"y" t" "ase layer (the resolution ratto 

24 to its output without performing any process on it. 25 "> b™'^ direcuon). The hor™i size of the 

. , , . . ^ . , , enhancement layer relative to that of the base layer is given 

The pixel replacement circuit 271 performs pixel replace- bv & 

meat in a similar manner to the pixel replacement circuit 221 ' , . /- ^ . 

(HG. 11) in the encoder (as shown in the flow chart of HG. hor_samplmg_factor n/hor_samplmg_factor„m 

14) in accordance to the decoded flag fiU_mode indicating , ^^^^ ver samp ling_f actor n and ver_sampling- 

the replacement mode 30 factor m indicate the ratio of the vertical length of the 

if,u a rm ' ^- *u • - i. ^ j r enhancement layer to that of the base layer (the rcsolution 

It the flag FR indicating the size ratio IS equal to 1 and if • a- \ ^ *• \ • r *u 

r ^ * A .t. • ■ ratio m the vertical direction). The vertical size of the 

rer_select_code= 00 , then the motion compensation cu"- u *^ i » *u * r.i. l i • • 

, J- . J - . , • , enhancement layer relative to that of the base layer is given 

cult 27 generates a predicted image signal m accordance hv ^ » 

with the motion vector and information representing the ^ n . i- ^ 

prediction mode supplied from the base layer VOP equal in ver_samplmg_factor_nA^er_samphng_factor_m 

time, and supplies the resultant signal to the arithmetic filLmode is an one-bit flag used to indicate the replace- 

operation circuit 25 ^^^^ mode. When this flag is equal to 1, the pixel 

In FIG. 18, the decoded enhancement layer image signal, repkcement is performed using a base layer image 

key signal, the flag FSZ_E indicating the size of the '^^^''^ has been converted m resolution. This flag is 

enhancement layer VOP. die flag FPOS_E indicating the .^^^ll!^}}^ enhancement layer, 

absolute coordinate position of the enhancement layer VOP. vno ^ ^^'l^o ^^^u 

and the flag FR indicating the size ratio are supplied to the to 2r Flags VOP width and VOP_height indicate the 

image reconstruction circuit U3 shown in FIG. 52. "'T'P^ll"! ^.''^ VOP_honzontal_ 

In no. 52. the image reconsirucUon circuit 113 recon- 'P^^f-^^-f and Vop_verticaL.spal.d_mc_ref mdi- 

, . . ' . ,^ , • . *5 cate the position of the correspondmg VOP represented in 

strucis an unage signal m accordance widi the image signal, absolute coordinates »^ * ^ 

the key signal, the flag indicating the size of the VOP the ^ g j^^j^^,^ ^^ ^ 

?nH J°''7"'g''^.^;''^!"'^,^<^'dinate position of the VOP. as7r«ference image in the forward prediction and 

and the nag FR mdicating the size ratio, supphed from the l i j j- *• - j -.u ..u n r i 

\rr^jiA^^^A^ 111 TH, u * / . J- backward prediction in accordance with the flag ref layer 

VOP decoder 112. The resultant reconstructed image signal ^„ o , i ^ , , j . ui~ 

is output to the outside. 'f' ^^^''''^ rel^elect^code are shown m Tables 

1 and 2. 

An example of scalable encoding syntax is described ^he bit stream output from the multiplexer 104 of the 

11 t. image signal encoder shown in FIG. 51 using the VOP 

FIG. 21 illustrates the strucmre of a bit stream. Herein, a encoder shown in RG. 1 may be transmitted over a trans- 

VS (video session) refers to a set of VO (video object) bit 55 mission line or recorded on a recording medium such as an 

streams. The syntax of VS is shown in FIG. 22. optical disk, a magnetic disk, or a magneto-optical disk. The 

FIG. 23 fllustrates the syntax of VO (video object). A VO bit stream recorded on the recording medium can be repro- 

is a bit stream associated with the entire image or a part of duced and decoded by the image signal decoder shown in 

an object in an image. FIG. 52 using the VOP decoder 112 shown in FIG. 18. 

In FIG. 21, a VOL(video object layer) includes a plurality 60 Now a second embodiment of the present invention is 

ofVOPs and is a class used to realize scalability. The syntax described below. In this second embodiment, three layers are 

of VOL is shown in FIG. 24. Each VOL is identified by a scalably encoded, although four or more layers may be 

number indicated by video_object_layer_id. For example, scalably encoded in a similar manner according to the 

if video_object_layer__id=^, then VOLO is a base layer. If invention. In this embodiment, the VO-by-VO scalable 

video_object_layer_id=l, VOLl is an enhancement layer. 65 encoding method described above is expanded to three-layer 

The number of scalable layers can be set to an arbitrary encoding. In this scalable encoding method for three layers, 

value. A flag video -object_Jayer_shapc indicates whether encoding for the base layer and a first enhancement layer is 
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performed in a similar maimer to the first embodiment 
described above. 

In the three -layer scalable -encoding, there are two 
eohancement layer in addition to the base layer. That is, 
there are the base layer, the first enhancement layer, and the 5 
second enhancement layer. An image obtained by decoding 
the layers up to the second enhancement layer has better 
image qual.ity than can be achieved by decoding the layers 
up to the first enhancement layer. Herein, the improvement 
in the image quality refers to the improvement in the spatial 
resolution in the case of the spatial scalability encoding, the 
improvement in the temporal resolution (frame rate) in the 
case of the temporal scalable encoding, and the improve- 
ment in the SNR of an image in the case of the SNR scalable 
encoding. 

The first enchantment layer and the second enhancement 
layer can have three different relationships as described 
below. 

1. The second enhancement layer includes the entire area 
of the first enhancement layer. 

2. The second enhancement layer corresponds to a partial 20 
area of the first enhancement layer. 

3. The second enhancement layer corresponds to an area 
wider than the first enhancement layer. 

The relationship types 1 and 2 are similar to those which 
occur in the first embodiment described above. 25 

The third type of relationship can occur when three or 
more layers are scalably encoded. More specifically, the 
third type of relationship occurs when the first enhancement 
layer corresponds to a partial area of the base layer and the 
second enhancement layer includes the entire area of the 30 
base layer, or when the first enhancement layer corresponds 
to a partial area of the base layer and the second enhance- 
ment layer corresponds to an area wider than the first 
enhancement layer and to a partial area of the base layer. 

When the three layers have the third type of relationship 35 
described above, if decoding is performed using the base 
layer and the first enhancement layer, a part of the image in 
the base layer is improved in image quality. If decoding is 
performed by further using the second enhancement layer, a 
wider area or the entire image in the base layer is improved 40 
in image quality. 

In the third type of relationship, the VOP can have either 
a rectangular shape or an arbitrary shape. 

Examples of scalable encoding processes for the third 
layer are shown in FIGS. 28 to 33. FIG. 28 illustrates an 45 
example in which a VOP has a rectangular shape and spatial 
scalability encoding is performed on it in the manner cor- 
responding to the first type of relationship described above. 
FIG. 29 illustrates an example in which a VOP also has a 
rectangular shape but spatial scalability encoding is per- 50 
formed in the manner corresponding to the second type of 
relationship described above. 

FIG. 30 illustrates an example in which VOPs in all layers 
have a rectangular shape and spatial scalability encoding is 
performed in the manner corresponding to the third type of 55 
relationship described above. FIG. 31 illustrates an example 
in which a VOP in the first enhancement layer has an 
arbitrary shape and a VOP in the second enhancement layer 
has a rectangular shape wherein spatial scalability encoding 
is performed in the manner corresponding to the third type 60 
of relationship described above. 

HGS. 32 and 33 iUustrate examples in which VOPs have 
an arbitrary shape and spatial scalability encoding is per- 
formed in the manner corresponding to the first type of 
relationship described above. 65 

Which scalable encoding mode is employed is determined 
in advance. 
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FIG. 34 illustrates an example of the circuit configuration 
of a VOP encoder 103 according to the second embodiment. 
Herein, the part used -to encode the base layer and the first 
enhancement layer is constructed in the same manner as the 
first embodiment (FIG. 1). The VOP encoder 103 of this 
second embodiment includes an additional part which is not 
included in the encoder shown in FIG. 1 and which is used 
to encode the second enhancement layer. The additional part 
includes a delay circuit 207, a second enhancement layer 
encoder 208, and a resolution converter 209. The first 
enhancement layer encoder 203 and the second enhance- 
ment layer encoder 208 have substantially the same con- 
struction. 

Although the scalable encoding is performed for three 
layers in this embodiment, the technique used herein to 
expand two layers to three layers can be employed to expand 
to N enhancement layers to N+1 layers thereby making it 
possible to apply the scalable encoding technique to an 
arbitrary number of layers. 

An image signal of each VOP, a key signal, a flag FSZ 
indicating the size of the VOP, and a flag FPOS indicating 
the absolute coordinate position thereof are input to a 
layered image signal generator 201. The layered image 
signal generator 201 generates a pluraHty of image signals in 
separate layers from ihe input signals. For example, in the 
case of the spatial scalability encoding, the layered image 
signal generator 201 reduces the input image signal and key 
signal at a proper ratio so as to generate an image signal and 
a key signal in the base layer. Similarly, the layered image 
signal generator 201 reduces the input image signal and key 
signal at a proper ratio so as to generate an image signal and 
a key signal in the first enhancement layer. Furthermore, the 
layered image signal generator 201 directly outputs the input 
image signal and key signal as an image signal and key 
signal in the second enhancement layer. Alternatively, the 
layered image signal generator 201 may perform a proper 
ratio of resolution conversion on the input image signal and 
key signal so as to generate an image signal and key signal 
in the second enhancement layer. In any case, the layered 
image signal generator 201 generates the first and second 
enhancement layers in accordance with predetermined 
method. 

In the case of the temporal scalability (scalability along 
the time axis), the layered image signal generator 201 
switches the output image signal among the base layer 
image and the enhancement layer images depending on the 
time. 

In the case of the SNR (signal-to-noise ratio) scalability, 
the layered image signal generator 201 supplies the input 
image signal and key signal directly to the respective layers. 
That is, the same image signal and key signal are supplied 
to the base layer and enhancement layers. 

In the case of the spatial scalability, the layered image 
signal generator 201 performs resolution conversion on the 
input image signal and key signal, and supplies the resultant 
image signal and key signal to the base layer and the first 
enhancement layer. The resolution conversion is performed 
by means of reduction filtering process using for example a 
reduction filter. Alternatively, after the layered image signal 
generator 201 performs resolution conversion on the input 
image signal and key signal, the resultant image signal and 
key signal may be supplied to the first and second enhance- 
ment layers. In this case, the resolution conversion is per- 
formed by means of expansion filtering process. Still 
alternatively, three separately generated image signals and 
associated key signals (which may or may not be equal in 
resolution) may be output from the layered image signal 
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generator 201 to the first and second enhancement layers and 
the base layer, respectively. In this case, which images are 
output to which layers is determined in advance. 

The layered image signal generator 201 also outputs flags 
indicating the sizes and absolute coordinate positions of 
VOPs in the respective layers. For example, in the case of 
the VOP encoder shown in FIG. 34, a flag FSZ_B indicating 
the size of the base layer VOP and a flag FPOS_B indicating 
the absolute coordinate position of the base layer VOP are 
output to the base layer encoder 204. On the other hand, a 
flag FSZ„E1 indicating the size of the first enhancement 
layer VOP and a flag FPOS_El indicating the absolute 
coordinate position of the first enhancement layer VOP are 
output to the first enhancement layer encoder 203 via the 
delay circuit 202. Furthermore, a flag FSZ J2 indicating 
the size of the second enhancement layer VOP and a flag 
FP0S_E2 indicating the absolute coordinate position of the 
second enhancement layer VOP are output to the second 
enhancement layer encoder 208 via the delay circuit 207, 

Furthermore, the layered image signal generator 201 
outputs a flag FRl indicating the ratio of the size of the first 
enhancement layer VOP relative to the size of the base layer 
VOP to the resolution converter 205 and the first enhance- 
ment layer encoder 203 via the delay circuit 202. 

Similarly, the layered image signal generator 201 outputs 
a flag FR2 indicating the ratio of the size of the second 
enhancement layer VOP relative to the size of the first 
enhancement layer VOP to the resolution converter 209 and 
the second enhancement layer encoder 208 via the delay 
circuit 207. 

In this second embodiment, an one-bit flag 
enhancement_type is set to indicate whether an image 
signal in the enhancement layer corresponds to either the 
entire area or a partial area of an image signal in a reference 
layer image, and the flag enhancement_type is encoded and 
transmitted. When the flag enhancemnt__type is equal to 
"0", the image signal in that layer corresponds to the entire 
area of the prediction reference layer image signal or cor- 
responds to a wider area. When the flag enhaDcemnt_type 
is equal to "1", the image signal in that layer corresponds to 
a partial area of the prediction reference layer image signal. 40 
Whether the layers used by the respective layers in predic- 
tion and the image signals in the respective layers corre- 
spond to the entire area or a partial area of the reference layer 
images is determined in advance. 

The layered image signal generator 201 generates image 45 
signals including particular areas with particular resolutions 
in the respective layers, and supplies the resuUant image 
signals to enhancement layer encoders 203 and 208, 
respectively, via the delay circuits 202 and 207. 
Furthermore, flags ref_layer_Jd indicating the layers 50 
referred to by the respective layers and flags enhancement_ 
type indicating whether the respective layer correspond to 
the entire area or a partial area of the reference images are 
supplied from the layered image signal generator 201 to the 
enhancement encoders 203 and 209. 

In FIG. 34, the delay circuit 202 and the resolution 
converter 205 operate in the same manner as in the first 
embodiment. 

Referring to FIG. 35, the first enhancement layer encoder 
203 is described below. The second enhancement layer 
encoder 208 has a similar circuit construction to that of the 
first enhancement layer encoder 203, and thus the descrip- 
tion about the first enhancement layer encoder 203 given 
herein below is also true for the second enhancement layer 
encoder 208. 

The first enhancement layer encoder 203 according to the 
second embodiment is similar to the enhancement layer 
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encoder 203 (FIG. 11) according to the first embodiment 
except for the pixel replacement circuit 231 shown in FIG. 
35. 

Referring to FIG. 36, the operation of the pixel replace- 
ment circuit 231 shown in FIG. 35 will be described below. 
First, in step S61, the flag enhance ment_type is checked so 
as to determine whether each layer is a part of a reference 
layer. If enhancement type="0'*, then the process goes to step 
S66, and replacement is performed by meas of intraframe 
extrapolation in a manner similar to that for the base layer 
as shown in the flow chart of FIG. 9. 

If enhance ment_type="l", then the process goes to step 
S62, and it is judged whether the corresponding key signal 
is equal to 0. When the pixel under judgement is within an 
image object, it is determined in step S62 that the key signal 
at a corresponding position has a value not equal to 0. In this 
case, the process goes to step S63, and the pixel replacement 
circuit 231 simply outputs the received pixel value without 
performing any replacement on it. On the other hand, if the 
corresponding key signal is equal to 0, the process goes to 
step S64 and the pixel value is replaced with 0. When the 
VOP has a rectangular shape, the key signal always has a 
value not equal to 0 (1 in the case of a binary key, 255 in the 
case of a gray scale key), and thus, in this case, all pixels of 
the VOP are simply output without being subjected to any 
process. In an area in which there is no image, key signals 
in that area have a value equal to 0, and thus the pixel values 
are replaced with 0. 

Then in step S65, the flag fill_mode indicating the 
replacement mode is checked, and replacement is performed 
in accordance with the replacement mode indicated by the 
flag. The replacement in the enhancement layer is performed 
in either of the following two modes. In a first mode, the 
replacement is performed in the same manner as in the 
replacement performed in the base layer. In the other mode, 
pixel values in the enhancement layer are replaced with 
values of pixels of a reference image in the base layer at 
corresponding locations. The latter mode is employed when 
the enhancement layer corresponds to a partial area of the 
base layer, and the encoding is performed in a spatiaUy 
scalable manner. The scalability mode and the replacement 
mode are both determined in advance. fill_mode is an 
one-bit flag indicating the replacement mode and is supplied 
from the pixel replacement circuit 231 to the variable-length 
encoder 6. The flag fiU_mode is encoded by the variable- 
length encoder 6 and transmitted. 

If it is concluded that the flag fill_mode indicating the 
replacement mode has a value equal to 0, then the process 
goes to step S 66 in which the pixel replacement circuit 231 
performs replacement in the same manner as that (FIG. 9) 
performed by the pixel replacement circuit 221 (FIG, 8) in 
the base layer. The resultant image signal is output to the set 
of frame memories 11. 

In the case where the flag fill__mode indicating the 
replacement mode has a value equal to 1, the process goes 
to step S67 in which the pixel replacement circuit 231 
replaces the pixel values in the enhancement layer with the 
pixel value of the base layer reference image signal at 
corresponding locations. This replacement method is 
described in further detail below with reference to FIG. 13. 

In an area containing an image in which corresponding 
key signals have a value not equal to 0 (for example the 
image object area in FIG. 13), the image in the enhancement 
layer is directly employed. In the other area (the area in 
which horizontal lines are drawn in FIG. 13), the reference 
image is obtained by replacing pixel values in the enhance- 
ment layer with pixel values of the base layer image which 
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has been converted in resolution (subjected to an select_code to a value depending on the predetermined 

up-sampling process) at locations corresponding to the loca- picture type and supplies it to the motion compensation 

tions of the reference image (UVOPO in FIG. 13). circuit 12 and the variable-length encoder 6, 

Examples of pixel replacement for three layers are shown the layer under consideration is a layer other than the 

in FIGS. 37 and 38. In the example shown in FIG. 37, the 5 highest layer, for example the first enhancement layer of the 

VOP in the base layer (VOLO) is an image having a ^hree layers, then the image signal output from the pixel 

rectangular shape (video_object_layer__shape !=00), and replacement circuit 231 and also the image signal output 

the enhancement layer (VOLl) is a part of the base layer ^^J" ^^"^ of frame memories 11 are supplied to the second 

(VOLO) and furthermore the VOP in the enhancement layer enhancement layer encoder 208 via the resolution converter 

has an arbitrary shape (video object layer shape=00), lo ^,1 ^ • • t. 

<.,.Kor,^«™o„* u™7\//^r'»rho^^ When the layer under consideration is the second 
The second enhancement l^ye (VOL^) has an nnage cor- enhancement layer which is the highest layer of the three 
respondmg to the same area as the VOLO and corresp^^^^^^^^^^ ^ 1 ^ ^^^^^^^^ J ^^^^^ 
to an area wider than the prediction reference layer (VOLl) Therefore, in this case, no output signal is suppU^ to 
wherem the image of the second enhancement layer has a ^^codeis in any other layers from the set of frame memories 
rectangular shape). In VOLl, the flag fill_mode mdicatmg 15 n and the pixel replacement circuit 231. 
the pixel replacement is set to "0", and thus pixel replace- Except for the point described above, the firet enhance- 
ment is performed in the manner (intraframe interpolation) ment layer encoder 203 of the second embodiment operates 
shown in the flow chart of FIG, 9. in a manner similar to the enhancement layer encoder 203 of 

In the example shown in FIG. 38, the VOP in the base the first embodiment, 

layer (VOLO) is an image having a rectangular shape, and 20 FIG. 39 illustrates an example of the circuit configuration 

the enhancement layer (VOLl) is a part of the base layer of the VOP decoder 112 corresponding to the VOP encoder 

(VOLO) and furthermore the VOP in the enhancement layer 103 shown in FIG. 34. Herein, the part used to decode the 

has an arbitrary shape. The second enhancement layer base layer and the first enhancement layer is constructed in 

(VOL2) has an image corresponding to the same area as the the same manner as the first embodiment (FIG. 18). The 

VOLO and corresponding to an area wider than the predic- 25 VOP decoder 112 of this second embodiment includes an 

tion reference layer (VOL) wherein the image of the second additional part which is not included in the decoder shown 

enhancement layer has a rectangular shape). In V0L2, the in FIG, 18 and which is used to decode the second enhance- 

flag fill_mode indicating the pixel replacement is set to "C*, ment layer. The additional part includes a delay circuit 256, 

while the flag fill_mode is set to "1" in VOLl. In this case, a second enhancement layer decoder 257, and a resolution 

the pixel values in VOLl are replaced by the corresponding 30 converter 258. The first enhancement layer decoder 253 and 

pixel values in VOLO. the second enhancement layer decoder 257 are substantially 

The difference between the examples shown in FIGS. 37 equal in construction to each other, 

and 38 is described below. In both examples shown in FIGS. A bit stream is first input to a demultiplexer 251. The 

37 and 38, the layer V0L2 is encoded with reference to the demultiplexer 251 demultiplexes the received bit stream into 

layer VOLL. However, VOLl corresponds to a part of 35 separate bit streams in the respective layers, and outputs the 

V0L2. In the case of the example shown in FIG. 37, the resultant bit streams. In the specific example shown in FIG. 

pixel replacement in the area in which there is no image and 39, the decoder is adapted to perform scalable decoding on 

thus corresponding key signals are equal to 0 is performed three layers, and thus the input bit stream is separated into 

by means ofintraframe extrapolation as shown in FIG. 9. As a second enhancement layer bit stream, an enhancement 

a result, in the area of VOLl in which the key signals are 40 layer bit stream, and a base layer bit stream, 

equal to 0, signals which have no relation with the corre- The base layer bit stream is directly supplied to the base 

sponding area of V0L2 are employed as prediction refer- layer decoder 254. On the other hand, the first enhancement 

ence signals. layer bit stream is supplied to the first enhancement layer 

On the other hand, in the case of the example shown in decoder 253 via the delay circuit 252. The second enfaance- 

FIG. 38, the pixel values in such the area of VOLl in which 45 ment layer bit stream is supplied to the second enhancement 

there is no image and thus corresponding key signals are layer decoder 257 via the delay circuit 256. 

equal to 0 are replaced with the pixel values in the base layer The delay circuits 252 and 256 make the first and second 

VOLO at corresponding locations. Thus, in the area of VOLl enhancement layer bit streams delayed by a time required 

in which key signals arc equal to 0, low-resolution image for the base layer decoder 254 to decode one VOP, and then 

signals of V0L2 at corresponding locations are employed as so output the delayed bit streams to the first and enhancement 

prediction reference signals. layer decoders 253 and 257, respectively. 

Thus, when the encoding coefficient is important in the The base layer decoder 254 is constructed in the same 

encoding process, the flag filL_mode is set to "1". manner as the base layer decoder of the first embodiment 

Referring again to FIG. 35, the first enhancement layer (FIG, 19). The decoded image signal and key signal output 

encode 203 is further described. Flags ref_layer__id and 55 from the base layer decoder 254 are supplied to the VOP 

enhancement_type are supplied from the layered image reconstmction circuit 259. On the other hand. The flags 

signal generator 201 (FIG. 34) to the variable-length encoder FPOS__B and FSZ_B indicating the position and the size of 

6 and are inserted at predetermined locations in the bit the VOP decoded by the base layer decoder 254 are also 

stream. The bit stream is then output via the transmission supplied to the VOP reconstruction circuit 259. 

buffer 7. 60 The decoded base layer image signal and key signal are 

The flag__enhancement_type is supplied to the pixel also supplied to the resolution converter 255 and are con- 
replacement circuit 231. In accordance with the flag, the verted in resolution. The resultant signals are supplied to the 
pixel replacement performs pixel replacement as described first enhancement layer decoder 253. 
above. The flags FSZ_3 and FSZ_B indicating the absolute 

The flag ref_layer_id is supplied to the motion vector 65 coordinate position and the size of the decoded base layer 

extraction circuit 232 and the motion compensation circuit VOP arc also supphed to the first enhancement layer decoder 

12. The motion vector extraction circuit 232 sets the ref_ 253. 
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The first enhancement layer bit stream generated by the The variable-length deccwler 22 also decodes the flag 

demultiplexer 251 is supplied to the enhancement layer FSZ_E1 indicating the size of the VOP and the flag FPOS__ 

decoder 253 via the delay circuit 252. £1 indicating the absolute coordinate position thereof, and 

On the other hand, the second enhancement layer bit supplies the decoded flags to the motion compensation 

stream generated by the demultiplexer 251 is supplied to the 5 circuit 27, a set of frame memories 26, a key signal decoder 

second enhancement layer decoder 257 via the delay circuit 274, and the VOP reconstruction circuit 259 shown in FIG. 
256. 

The delay drcuits252 and 256 are constructed in the same Ftirthermore, the variable-length decoder 22 also decodes 

manner^ Furthermore, the first enhancement layer decoder g indicating the ratio of the size (resolution) of 

253 and the second enhancement layer decoder 257 are ,he enhancement layer image VOP to the size (resolution) of 

constructed m the same manner. »_ i - a/a^S j v , 

The decoded image signal and key signal output from the ^'^^ ^^y^' "^^^^ 5°^ ^^P^^^^ 5^^^^ 

first enhancement layer decoder 253 are supplied to the VOP "1°}^°° compe^ation circuit 27 and the resoluUon converter 

reconstruction circuit 259 and the resolution converter 258. shown in FIG. 39^ 

The flags FPOSJl and FSZJl indicating the position According to the flag FRl mdicating the size ratio 

and the size of the VOP decoded by the first enhancement 15 (magnification), the resolution converter 255 performs reso- 

layer decoder 253 are also supplied to the VOP reconstruc- lution conversion on the decoded base layer image signal 

tion circuit 259. and its associated key signal using a filter. The resultant 

The decoded first enhancement layer image signal and signals are supplied to a set of frame memories 273 in the 

key signal are also supplied to the resolution converter 258 enhancement layer decoder 253. 

and are converted in resolution. The resultant signals are 20 The variable-length decoder 22 also decodes the flag 

supplied to the second enhancement layer decoder 257. ref_layer_id indicating the reference layer used in 

The flags FSZ~E1 and FSZ_Ei indicating the absolute prediction, and the flag ref_select_code, and supplies the 

coordinate position and the size of the decoded first result to the motion compensation circuit 27. Still 

enhancement layer VOP are also supplied to the second furthermore, the variable-length decoder 22 also decodes the 

enhancementlayer decoder 257. ^5 flag fill_mode indicating flie replacement mode and sup- 

The flag FRl indicating the resolution conversion ratio p^es the result to a pixel replacement circuit 271. The 

decoded by the first enhancement layer decoder 253 is variable-length decoder 22 also extracts the key signal bit 

supphed to the resolution converter 255. In accordance with j Jt, i- ^ ^^*^^„^^^a u • i u * * ° ♦ *u 

*u J J J a rr^i • J- ^- ^i. i Stream Eud supplies the extracted key Signal bit Stream to the 

the decoded ilag bRl indicatmg the resolution conversion sisnal decoder 274 

ratio, resolution conversion is performed by the resolution , Xn, • ui i *u j j -^-^ i j j ^ 

converter 255 f j vanable -length decoder 22 also decodes the flag 

Ihc decoded image signal and key signal output from the enhancement_type indicating whether the layer under con- 
second enhancement decoder 257 are supplied to the VOP sideration corresponds to either the entire area or a partial 
reconstruction circuit 259. The flags FP0S.E2 and FSZ_E2 ^he reference layer, and supplies it to the pixel 
indicating the position and the size of the VOP decoded by replacement circuit 271 and the VOP reconstruction circuit 
the second enhancement layer decoder 257 are also supplied ^5 259 shown in FIG. 39. 

to the VOP reconstruction circuit 259. The key signal decoder 274 decodes the key signal bit 

The flag FR2 indicating the resolution conversion ratio stream supplied from the variable-length decoder 22 in 

decoded by the second enhancement layer decoder 257 is accordance with a decoding method corresponding to the 

supplied to the resolution converter 258. In accordance with encoding method employed. The decoded key signal is 

the decoded flag FR2 indicating the resolution conversion 40 supplied to an IDCT circuit 24, the motion compensation 

ratio, the resolution converter 258 performs resolution con- circuit 27, and the pixel replacement circuit 271. 

version. The inverse quantization circuit 23 performs inverse 

The details of the VOP reconstruction circuit 259 wiU be quantization on the data (quantized DCT coefficients) sup- 
described later. plied from the variable-length decoder 22 bock by block in 

Referring now to FIG. 40, the first enhancement layer 45 accordance with the quantization step supplied from also the 

decoder 253 is described. In FIG. 40, similar elements to variable-length decoder 22. The resultant signal is supplied 

those in FIG. 45 or 20 are denoted by similar reference to the IDCT circuit 24, The IDCT circuit 24 performs an 

numerals. The second enhancement layer decoder 257 has a inverse DCT process on the data (DCT coefiScients) output 

similar circuit construction to that of the first enhancement by the inverse quantization circuit 23, and supplies the 

layer decoder 253, and thus the description about the first 50 resultant data to an arithmetic operation circuit 25. 

enhancement layer decoder 257 given herein below is also In the case where the image signal supplied from the 

true for the second enhancement layer decoder 253. IDCT circuit 24 is I-picture data, the image signal is directly 

The first enhancement layer decoder 253 according to the output via the arithmetic operation circuit 25 without being 

second embodiment is similar to the enhancement layer subjected to any process, and is stored in the set of frame 

decoder 253 (FIG. 20) according to the first embodiment 55 memories 26 via the pixel replacement circuit 261 for use in 

except for the pixel replacement circuit 271. generating a predicted image signal of an image signal 

After the enhancement layer bit stream is stored tempo- which will be input later to the arithmetic operation circuit 

rarily in a reception buffer 21, the enhancement layer bit 25. The image signal output from flie arithmetic operation 

stream is supplied to a variable-length decoder 22. The circuit 25 is directly output to the image reconstruction 

variable-length decode 22 performs variable-length decod- 60 circuit 259 shown in FIG. 39. 

ing on the enhancement layer bit stream supplied from the When the image signal supplied from the IDCT circuit 24 

reception buffer 21 thereby supplying a motion vector and is a P-picture or a B-piclure, the motion compensation 

information representing the prediction mode to a motion circuit 27 generates a predicted image signal from the image 

compensation circuit 27, information representing the quan- signal stored in the set of frame memories 26 or 273 in 

tized step to an inverse quantization circuit 23, and the 65 accordance with the motion vector, the prediction mode, the 

variable-length decoded data to the inverse quantization flags ref_layer_id and ref_select_code supplied from the 

circuit 23. variable-length decoder 22, and outputs the resultant signal 
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to the arithmetic operation circuit 25. The arithmetic opera- 
tion circuit 25 adds the predicted image signal supplied from 
the motion compensation circuit 27 to the image signal 
supplied from the IDCT circuit 24 thereby creating a repro- 
duced image signal. When the image signal supplied from 
the IDCT circuit 24 is a P-picture, the image signal output 
from the arithmetic operation circuit 25 is also stored in the 
set of frame memories 26 via the pixel replacement circuit 
271 so that it can be used as a reference image in the process 
of decoding a subsequent image signal. However, in the case 
of an intramacroblock, the arithmetic operation circuit 25 
simply transfers the image signal supplied from the IDCT 
circuit 24 to its output without performing any process on it. 

In accordance with the decoded flag flag_mode indicat- 
ing the replacement mode, the pixel replacement circuit 271 
performs pixel repayment in a similar manner to the pixel 
replacement circuit 231 (FIG. 35) in the encoder (as shown 
in the flow chart of FIG. 36). 

If the flag FR indicating the size ratio is equal to 1 and if 
ref_select__code-*00', then the motion compensation cir- 
cuit 27 generates a predicted image signal in accordance 
with the motion vector and information representing the 
prediction mode supplied from the base layer VOP equal in 
time, and supplies the resultant signal to the arithmetic 
operation circuit 25. 

In FIG. 39, the decoded enhancement layer image signal, 
key signal, flag FSZ__E1 indicating the size of the enhance- 
ment layer VOP, and the flag FPOS_El indicating the 
absolute coordinate position of the enhancement layer VOP 
are supplied to the image reconstruction circuit 259. 

The VOP reconstruction circuit shown in FIG. 39 is 
described in further detail below. FIG. 41 illustrates an 
example of the circuit configuration of the VOP reconstruc- 
tion circuit 259. Although in this specific example the VOP 
reconstruction circuit 259 is a part of the VOP decoder 112 
shown in FIG. 39. the VOP reconstruction circuit 259 is also 
a part of the image reconstruction circuit 113 in the image 
signal decoder shown in FIG. 52. Image signals, key signals, 
flags FR indicating the size ratio relative to the prediction 
reference layer, flags FSZ and FPOS indicating the size and 
position of VOPs, which are output from the decoders in the 
respective layers, are first input to resolution converters 311 
to 313, and converted in resolution at the specified ratio. 

Which layer is employed as a final decoded output signal 
is specified by a flag D_M which is set in accordance with 
an instruction externally given by a user. The flag D__M is 
supplied to a layer selection circuit 317. 

The resolution converters 311 to 313 determine conver- 
sion ratios in accordance with the layer to be displayed and 
the flag FR indicating the size ratio relative to the prediction 
reference layer. The conversion ratios are determined start- 
ing with the highest layer. That is, in accordance with the 
flag which is given by a user from the outside to indicate the 
layer to be displayed, the resolution conversion ratio for the 
highest layer to be displayed is set to 1. The conversion ratio 
for a layer used as a prediction reference layer by the highest 
layer to be displayed is then determined in accordance with 
the flag FR indicating the ratio relative to the prediction 
reference layer transmitted in the highest layer. That is, the 
conversion ratio is set to be equal to FR. The conversion 
ratio for a layer which is further referred to by the above 
prediction reference layer is set to the conversion ratio of 
this layer times FR of this layer. The conversion ratios are 
determined for other layers in a similar manner. 

After being converted in resolution, the image signals, 
key signals, and signals FSZ and FPOS indicating the size 
and the position of VOPs are supplied to the sets of frame 
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memories 314 to 316 and stored therein. These signals are 
then read out from the sets of frame memories in the 
predetermined order. 

The flag D_M indicating which layer is to be displayed 

5 is input to the layer selection circuit 317. In accordance with 
the flag D„M, the layer selection circuit 317 turns on a 
switch corresponding to the layer to be displayed so that the 
signals associated with that layer are supplied to an arith- 
metic operation circuit 314. The switches associated with the 

10 other layers are turned off so that decoded images in those 
layers are not read out from the memories. 

The image signals supplied via the layer selection circuit 
317 are added together by the arithmetic operation circuit 
314 in accordance with the respective key signals. 

15 Furthermore, in the image reconstruction circuit 113 shown 
in FIG. 52, an image signal is reconstructed from the image 
signals and key signals supplied from the respective VOP 
reconstruction circuits, and the resultant reconstructed 
image signal is output to the outside. 

20 FIG. 42 illustrates a modification of the VOP reconstruc- 
tion circuit 259. In this VOP reconstruction circuit shown in 
FIG. 42, it is assumed that the pixel replacement circuits 231 
in the respective layers perform pixel replacement with 
fill_mode«"0" in the encoding process. In this case, the 

25 decoders in the layers in which the flag fill_mode is set to 
"0"* supply only image signals to the VOP reconstruction 
circuit 259. 

The image signals input to the VOP reconstruction circuit 
259 are appUed to the layer selection circuit 317. 

30 Also in this example, which layer is finally decoded and 
displayed is specified by the flag D_M which is set in 
accordance with an instruction externally given by a user. 
The flag D_M is supplied to the layer selection circuit 317. 
In accordance with the flag D_M, the layer selection 

35 circuit 317 turns on a switch corresponding to the layer to be 
displayed and tums off the other switches corresponding to 
the layers which are not displayed so that no decoded image 
signals in those layers are read out. In the example shown in 
FIG. 42, only one switch is turned on and the other switches 

40 are turned off, in any situation. 

As described above, when encoding is performed with 
fill_mode«"0", it is possible to employ a simple VOP 
reconstruction circuit such as that shown in FIG. 42. This 
makes it possible to remove the sets of frame memories 314 

45 to 316 shown in FIG. 41, and thus a reduction in cost can be 
achieved. 

FIG. 38 illustrates an example in which filLmode="0". 
In this case, the sets of frame memories 26 of the decoders 
of both enhancement layers (VOLl, V0L2) store the image 

50 signals in the same area, and pixel replacement process is 
performed using low-resolution image signals at the same 
locations. Therefore, pixel replacement can be performed by 
reading image signals in either one layer from the set of 
frame memories 26. This means that the set of frame 

55 memories 26 of the decoder (FIG. 40) and the sets of frame 
memories 314 to 316 of the VOP reconstruction circuit 259 
(FIG. 41) may be realized by a single set of frame memories 
used for these purposes. 
In contrast, when fill_mode«"l" as is the case in the 

60 example shown in FIG. 37, the areas of the respective layers 
do not necessarily coaespond to one another. Besides, pixel 
replacement is performed by means of inlraframe extrapo- 
lation. For the above reasons, the set of frame memories 25 
used for prediction in the decoder can not be shared by the 

65 sets of frame memories 314 to 316 of the VOP reconstruc- 
tion circuit 259, and thus the construction shown in FIG. 41 
is necessary. 
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However, when fill_mode="0", if the same memory is 
employed for use as a prediction memory and also as a 
reconstruction memory, this construction is unsuitable for 
use in edition of image objects. For example, to replace only 
the background image with another bit stream, it is more 
desirable to form the VOP reconstruction circuit 259 as 
shown in FIG. 41. 

Therefore, when it is desired to achieve a high encoding 
efficiency or a small-scaled circuit, pixel replacement is 
performed with fill_mode=0, and encoding/decoding is 
performed in a corresponding fashion. On the other hand, 
when it is desired to re-edit images, encoding is performed 
with fiU__mode="l". 

The syntax of the scalable encoding according to the 
second embodiment is described below for the case of 
MPEG4VM (verification model). In the second 
embodiment, the syntax is similar to that of the first embodi- 
ment except for that associated with VOL. 

FIG. 43 illustrates the syntax of VOL. As in the first 
embodiment, fill_mode is an one-bit flag used to indicate 
the replacement mode. When the flag fill„mode is equal to 
1, pixel replacement is performed using a base layer image 
which has been converted in conversion. This flag is trans- 
mitted only in the enhancement layer. enhancement_type is 
an one-bit flag used to indicate whether the corresponding 
layer is a part of a prediction reference layer. When 
enhancement_type='"l", the corresponding layer is a part of 
the prediction reference layer. In the other cases, 
enhancement_type is set to "0*'. 

A program used to execute the above-described processes 
may be transmitted to a user via a transmission medium. 
Transmission media available for this purpose include a 
receding medium such as a magnetic disk, a CD-ROM, and 
a solid state memory, and a communication medium such as 
a network and a satellite communication system. 

As described above, in the image signal encoding method 
and the image signal encoding apparatus, an image signal 
decoding method and an image signal decoding apparatus, 
and the image signal transmission method, according to the 
present invention, a reference image is generated by replac- 
ing the pixels outside an image object in the enhancement 
layer with proper pixels in the base layer so that a motion 
vector is detected in a highly efiHcienl fashion and so that 
encoding eflBciency is improved. This technique also allows 
a reduction in calculation cost. 

Although the present invention has been described above 
with reference to specific embodiments, the invention is not 
limited to these embodiments. Variotis modifications and 
applications are possible without departing from the sprit 
and scope of the invention. 

What is claimed is; 

1. An image signal encoding apparatus for encoding a 
plurality of image signals, at least one of said plurality of 
image signals being an image signal representing a moving 
image object, said at least one of the plurality of image 
signals including a signal used to combine it with at least one 
other image signal of said plurality of image signals, said 
apparatus comprising: 
an image supplier for supplying a base layer image signal 
and an enhancement layer image signal scalably rep- 
resenting said image signal representing a moving 
image object; 

an enhancement layer encoder for encoding said enhance- 
ment layer image signal thereby generating an encoded 
enhancement layer signal; and 

a base layer encoder for encoding said base layer image 
signal thereby generating an encoded base layer signal; 
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wherein said enhancement layer encoder comprises: 
a generator for generating a reference image signal used 
to calculate a motion vector of the enhancement layer 
image signal to be encoded, said reference image signal 
5 being generated by replacing the values of pixels 
outside said image object of the enhancement layer 
image signal with the values of predetermined pixels at 
corresponding locations of the base layer image signal; 
a detector for detecting the motion vector of said enhance- 
ment layer image signal to be encoded using said 
reference image signal; and 
an encoder for encoding said enhancement layer image 
signal using a predicted image signal of said enhance- 
ment layer image signal, said predicted image signal 
being generated by performing motion compensation 
using said detected motion vector. 

2. An image signal encoding apparatus according to claim 
1, wherein said generator replaces the values of pixels 
outside said image object of the enhancement layer image 
signal with the values of pixels at corresponding locations of 
a base layer image signal which is time coincident with said 
reference image signal thereby generating said reference 
image signal used to calculate the motion vector of the 
enhancement layer image signal to be encoded, 

3. An image signal encoding apparatus according to claim 
1, wherein said generator replaces the values of pixels 
outside said image object of the enhancement layer image 
signal with the values of pixels at corresponding locations of 
a base layer image signal which is time coincident with said 
image signal to be encoded thereby generating said refer- 
ence image signal used to calculate the motion vector of the 
enhancement layer image signal to be encoded. 

4. An image signal encoding apparatus according to claim 
1, wherein said enhancement layer encoder generates a flag 
indicating an image to be replaced. 

5. An image signal encoding apparatus according to claim 
1, wherein said image supplier includes a layered signal 
generator for generating said enhancement layer image 
signal and said base layer image signal represented in a 
scalable fashion from said image signal representing a 
moving image object. 

6. An image signal encoding apparatus according to claim 
1, wherein: 

said generator has a first replacement mode and a second 

replacement mode; 
in said first replacement mode, said generator replaces the 
values of pixels outside said image object of the 
enhancement layer image signal with the values of 
50 predetermined pixels ^of the base layer image signal 
thereby generating the reference image signal used to 
calculate the motion vector of the enhancement layer 
image signal to be encoded; 
in said second replacement mode, said generator replaces 
55 the values of pixels outside said image object of the 
enhancement layer image signal with values obtained 
by extrapolating pixel values inside said image object 
thereby generating the reference image signal used to 
calculate the motion vector of the enhancement layer 
60 image signal to be encoded; and 

said generator generates a flag indicating a replacement 
mode. 

7. An image signal encoding apparatus according to claim 
1, wherein said image supplier fiirther supplies a flag indi- 
es eating the size of said enhancement layer image signal, a flag 

indicating the position thereof with respect to an absolute 
position, a flag indicating the size of said base layer image 
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signal, and a flag indicating the position thereof with respect 
to the absolute position. 

8. An image signal encoding apparatus according to claim 
7, wherein said image supplier further supplies a flag indi- 
cating the ratio of the resolution of said enhancement layer 
image signal to that of said base layer image signal. 

9. An image signal encoding apparatus according to claim 
1, wherein: said enhancement layer inaage signal is a first 
enhancement layer image signal, said image suppUer further 
supplies a second enhancement layer image signal which is 
higher in layer than said enhancement layer image signal, 
said first and second enhancement layer image signals 
representing difierently scaled versions of said image signal 
representing a moving image object; and 

said apparatus further comprises a second enhancement 
layer encoder for encoding said second enhancement 
layer image signal thereby generating a second encoded 
enhancement layer signal, 

said second enhancement layer encoder comprising: 

a generator for generating a second reference image signal 
used to calculate a motion vector of the second 
enhancement layer image signal to be encoded, said 
second reference image signal being generated by 
replacing the values of pixels outside said image object 
of said second enhancement layer image signal with the 
values of predetermined pixels of the first enhancement 
layer image signal; 

a detector for detecting the motion vector of said second 
enhancement layer image signal to be encoded using 
said second reference image signal; and 

a second encoder for encoding said second enhancement 
layer image signal using a second predicted image 
signal of said second enhancement layer image signal, 
said second predicted image signal being generated by 
performing motion compensation using said detected 
motion vector of the second enhancement layer image 
signal. 

10. An image signal encoding method for encoding a 
plurality of image signals, at least one of said plurality of 
image signals being an image signal representing a moving 
image object, said at least one of the plurality of image 
signals including a signal used to combine it with at least one 
other image signal of said plurality of image signals, said 
method comprising the steps of: 

supplying a base layer image signal and an enhancement 
layer image signal scalably representing said image 
signal representing a moving image object; 

encoding said enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 

encoding said base layer image signal thereby generating 
an encoded base layer signal; 

wherein said step of encoding said enhancement layer 
image signal comprises the steps of: 

generating a reference image signal used to calculate a 
motion vector of the enhancement layer image signal to 
be encoded, said reference image signal being gener- 
ated by replacing the values of pixels outside said 
image object of the enhancement layer image signal 
with the values of predetermined pixels at correspond- 
ing locations of the base layer image signal; 

detecting the motion vector of said enhancement layer 
image signal to be encoded using said reference image 
signal; and 

encoding said enhancement layer image signal using a 
predicted image signal of said enhancement layer 
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image signal, said predicted image signal being gener- 
ated by performing motion compensation using said 
detected motion vector. 

11. The image signal encoding method of claim 10, 
wherein said enhancement layer signal is a first enhance- 
ment layer signal, and said method further comprises the 
steps of: 

supplying a second enhancement layer image signal 
which is higher in layer than said first enhancement 
layer image signal, said first and second enhancement 
layer image signals representing differently scaled ver- 
sions of said image signal representing a moving image 
object; 

encoding said second enhancement layer image signal 
thereby generating a second encoded enhancement 
layer signal, by: 

generating a second reference image signal used to cal- 
culate a motion vector of the second enhancement 
image signal, said second reference image signal being 
generated by replacing the values of pixels outside said 
image object of said second enhancement layer image 
signal with the values of predetermined pixels of the 
first enhancement layer image signal; 

detecting the motion vector of said second enhancement 
layer image signal to be encoded using said second 
reference image signal; 

encoding said second enhancement layer image signal 
using a second predicted image signal of said second 
enhancement layer image signal, said second predicted 
image signal being generated by performing motion 
compensation using said detected motion vector of said 
second enhancement layer image signal; and 

transmitting said encoded second layer image signal. 

12. An image signal transmission method for encoding a 
plurality of image signals, at least one of said plurality of 
image signals being an image signal representing a moving 
image object, said at least one of the plurality of image 
signals including a signal used to combine it with at least one 
other image signal of said plurality of image signals, said 
method comprising the steps of: 

supplying a base layer image signal and an enhancement 
layer image signal scalably representing said image 
signal representing a moving image object; 

encoding said enhancement layer image signal thereby 
generating an encoded enhancement layer signal; and 

encoding said base layer image signal thereby generating 
an encoded base layer signal; 

wherein said step of encoding said enhancement layer 
image signal comprises the steps of: 

generating a reference image signal used to calculate a 
motion vector of the enhancement layer image signal to 
be encoded, said reference image signal being gener- 
ated by replacing the values of pixels outside said 
image object of the enhancement layer image signal 
with the values of predetermined pixels at correspond- 
ing locations of the base layer image signal; 

detecting the motion vector of said enhancement layer 
image signal to be encoded using said reference image 
signal; 

encoding said enhancement layer image signal using a 
predicted image signal of said enhancement layer 
image signal, said predicted image signal being gener- 
ated by performing motion compensation using said 
detected motion vector; and 

generating a flag indicating an image to be replaced; 



10/30/2002, EAST Version: 1.03.0002 



us 6,173, 

51 

said method further comprising the step of transmitting 
said encoded enhancement layer image signal, said 
encoded base layer image signal, said motion vector, 
and said flag. 

13. The image signal transmission method of claim 12, 5 
wherein said enhancement layer signal is a first enhance- 
ment layer signal, and said method further comprises the 
steps of: 

supplying a second enhancement layer image signal 
which is higher in layer than said first enhancement 
layer image signal, said first and second enhancement 
layer image signals representing differently scaled ver- 
sions of said image signal representing a moving image 
object; 

encoding said second enhancement layer image signal 
thereby generating a second encoded enhancement 
layer signal, by: 
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generating a second reference image signal used to cal- 
culate a motion vector of the second enhancement 
image signal, said second reference image signal being 
generated by replacing the values of pixels outside said 
image object of said second enhancement layer image 
signal with the values of predetermined pixels of the 
first enhancement layer image signal; 
detecting the motion vector of said second enhancement 
layer image signal to be encoded using said second 
reference image signal; and 
encoding said second enhancement layer image signal 
using a second predicted image signal of said second 
enhancement layer image signal, said second predicted 
image signal being generated by performing motion 
compensation using said detected motion vector of said 
second enhancement layer image signal. 
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